Executive summary

Research goal

The aim of this analysis is to first evaluate the effectiveness of surveillance cameras in reducing crime as well as certain types of crimes. Over a second phase, what determines the CCTV density of a given area is considered.

Methodology

We carried out our analyses using mainly 5 data sets retrieved on Baltimore’s open data portal allowing us to calculate, among other things, values about crime and different types of crime committed in Baltimore as well as information about location of surveillance cameras. In addition to graphically representing this information in the form of charts and tables, we have made extensive use of maps, allowing us to visualise this information in a more contextual way.

Main Takeaways

  • Analysing our data sets, we found that CCTVs were mostly concentrated in the city centre. Crime per capita was also highest in the inner city, suggesting a correlation between crime per capita and CCTV. Over the 2014-2019 period, crime (in absolute numbers) has increased slightly, although there has been a downward trend since a peak in 2017.

  • CCTVs are not significantly correlated with crime reduction. Furthermore, it is observed that crimes are sometimes committed right in front of the surveillance cameras leading us to believe that CCTVs are not effective crime deterrents. A similar conclusion is found for all soecific types of crime.

  • The most important determinant of CCTV density in a given area is not race, nor education or socio-economic condition but violent crime per capita.

Conclusion

Our analysis does not demonstrate that CCTVs can effectively deter crime in general, or any specific type of crime in particular. We also find that the CCTV density of an area is not significantly correlated with the poverty level of the area, nor with any of the other educational, socio-economic or racial indicators we considered. We have determined that violent crime per capita appears to be a good indicator of the CCTV density of a given area.

Introduction

Overview and motivation

Video surveillance (CCTV) is a technology that is nowadays deeply woven into the everyday life of many people as one tends to expect it in many varied circumstances (Ossola, 2019). The rationale behind the installation of these systems seems to be very clear for governments. For example, on Buffalo’s (NY) open data website, one can read that “the City of Buffalo deploys a real-time, citywide video surveillance system to augment the public safety efforts of the Buffalo Police Department”. Yet, the development of this new technology, is not exempt from any controversy. For instance, many observers claim that the expansion of video surveillance poses an unregulated threat to privacy (ACLU, 2021). Still, many people seem to be willing to accept this loss in privacy as the surge in video surveillance makes them feel safer (Madden & Rainie, 2015).

Throughout this research, we challenge the widespread belief that people who have “nothing to hide” should be content with the expansion of CCTV networks as the latter makes them safer (Madden & Rainie, 2015). Indeed, on top of many privacy issues linked with this surge in video surveillance systems, one might legitimately ask the question whether these cameras actually make people safer?

The goal of this project in the first phase is to investigate the crime deterrent potential of CCTVs in an Amercian city. This potential will also be compared to the different types of crime that are committed in this area. Over a second phase, the dispersion of CCTVs within the city will be investigated. Indeed, according to some researches, mass surveillance has a stronger impact on communities already disadvantaged by their poverty, race, religion, ethnicity, or immigration status (Gellman & Adler-Bell, 2017). We would like to see whether our data enables us to validate or invalidate this theory. It would also be extremely interesting, even though challenging, to see whether the installation of surveillance systems could potentially create even more pernicious issues such as crime displacements (Waples, Gill & Fisher, 2009).

In sum we argue that, in a world where CCTVs and other surveillance systems are flourishing, it might be beneficial to take a step back and question both the efficacy and the implementation design of such technologies, since they are often portrayed by different stakeholders as miraculous solutions to very complex issues.

Backgrounds

Augustin: Augustin obtained a degree in Business Administration at the University of St-Gallen where he had the opportunity to develop a strong interest in digital business ethics. He wrote his bachelor’s thesis on the privacy implications of the use of fear appeals in home surveillance devices’ marketing strategy.

Marine: Marine made a bachelor in Law at the UBO (Université de Bretagne-Occidentale). She is presently into the Master DCS (Droit, Criminalité et Sécurité des technologies de l’information) at the Unversity of Lausanne. Last year, she had the opportunity to take a data protection course and learn more about cyber security and crime in general.

Daniel: Daniel is an exchange student from Koblenz, Germany. Daniel obtained a bachelor’s degree in Business Administration/Management at the WHU - Otto Beisheim School of Management, Germany. He is currently pursuing a Master of Management with a focus on family businesses, entrepreneurship and data science in his courses. Interestingly regarding this project, Daniel spend several months in the United states after high school and thus he can relate to the topic about police violence and crimes in the US.

Motivations

Firstly, from our respective backgrounds, we derive a strong interest in new technologies and privacy. We believe that every person is entitled to the fundamental right to privacy. Unfortunately, one observes an increasing tendency of governments and other stakeholders (e.g. businesses such as GAFA (Google, Amazon, Facebook, Apple)) to take more and more control in our daily lives through digital technologies such as cameras, computers or smartphones. For these reasons it is interesting to ask ourselves if this massive collection of our data leads to more security or more restrictions of our freedom.

Secondly, if we look at European law like the GDPR, collection and processing of our data must be proportionate to the purpose of that processing. Therefore, it is of our interest to determine if these applications are the same in the United States and to see if the installation of cameras, with the objective of security, really allows to reduce crime and to make a city more secure.

Research questions

We have summarised our objectives in these four research questions:

  1. Does the presence of CCTVs in a given area actually deter crime?
  2. What types of crimes may be deterred by surveillance cameras?
  3. Is the impact of CCTV installation on crime reduction higher/lower/same in higher income neighborhoods compared to lower income neighborhoods?
  4. Are there more public cameras in lower income/higher unemployment areas compared to higher income/employment areas? (Does the government respect privacy issues depending on your income level?)

Data

Data source

We have five main raw data sets. All data sets were retrieved on Baltimore government’s open data portal. We found data about crimes committed in Baltimore, CCTV location in the city and poverty level. We also found a data set showing the reference boundaries of the Community Statistical Area geographies. The latter will certainly be helpful to match each data set’s observations together. Finally, we also obtained a data set containing respective population of each community.

Raw Data sets

2.1 Crime Data set

This dataset represents the location and characteristics of major crime against persons such as homicide, shooting, robbery, aggrevated assault etc. within the city of Baltimore. This data set contains 350’294 observations.

  • RowID = ID of the row, 350’294 in total

  • CrimeDateTime = date and time of the crime. Format yyyy/mm/dd hh:mm:sstzd

  • CrimeCode = Code corresponding to the type of crime committed

  • Location = Textual information on where the crime was committed

  • Description = Textual description of the crime committed corresponding to a CrimeCode.

  • Weapon = Provides details on what weapon has been used, if any

  • Post = Number corresponding to the Police Post concerned. A map with corresponding police posts can be found here: http://moit.baltimorecity.gov/sites/default/files/police_districts_w_posts.pdf?__cf_chl_captcha_tk__=pmd_NhnE710SS8QEWdKOyT5Ug6IJZGoF6iIntFYY30vctes-1634309136-0-gqNtZGzNAxCjcnBszQPl

  • District = Name of the district, regrouping different neighbourhoods. Baltimore is officially divided into nine geographical regions: North, Northeast, East, Southeast, South, Southwest, West, Northwest, and Central.

  • Neighborhood = Name of the neighborhood in which the crime was committed. Most names matches with neighborhood names contained in the dataset about Community Statistical Areas.

  • Latitude = Latitude, Coordinate system: EPSG:4326 WGS 84

  • Longitude = Longitude, Coordinate system: EPSG:4326 WGS 84

  • GeoLocation = Combination of latitude and longitude, Coordinate system: EPSG:4326 WGS 84

  • Premise = Information on the premise where the crime was committed.

crime_data <- read.csv(file = here::here("data/Baltimore_Part1_Crime_data.csv"))

Source of the data set:
[https://data.baltimorecity.gov/datasets/part1-crime-data/explore]

2.2 CCTV Data set

This dataset represents closed circuit camera locations capturing activity within 256ft (~2 blocks). It contains 837 observations in total.

  • X = Longitude: Coordinate system: EPSG:3857 WGS 84 / Pseudo-Mercator

  • Y = Latitude: Coordinate system: EPSG:3857 WGS 84 / Pseudo-Mercator

  • OBJECTID = ID of of the camera, 837 in total

  • CAM_NUM = Unique number attributed to the camera. This might suggest that the data set does not show the location of every camera in Baltimore.

  • LOCATION = Textual information on where the camera is located

  • PROJ = Name of the area in which the camera is located. It does not always match the name of the “standard” community statistical areas.

  • XCCORD = Longitude, Coordinate system: EPSG:4326 WGS 84

  • YCOORD = Latitude, Coordinate system: EPSG:4326 WGS 84

cctv_data <- read.csv(file = here::here("data/Baltimore_CCTV_Locations_Crime_Cameras.csv"))

Source of the data set:
[https://data.baltimorecity.gov/datasets/cctv-locations-crime-cameras/explore]

2.3 Poverty Data set

This dataset provides information about the percent of family households living below the poverty line. This indicator measures the percentage of households whose income fell below the poverty threshold out of all households in an area.

Federal and state governments use such estimates to allocate funds to local communities. Local communities use these estimates to identify the number of individuals or families eligible for various programs. These information will be useful for us to study the dispersion of CCTVs within Baltimore in comparison to the poverty level in a given area. This dataset contains 55 observations, one percentage for each community statistical area. There seems to only be one NA. The most relevant variables are the following:

  • CSA2010 = name of the community statistical area. The Baltimore Data Collaborative and the Baltimore City Department of Planning divided Baltimore into 56 CSAs. These 56 units combine Census Bureau geographies together in ways that match Baltimore’s understanding of community boundaries, and are used in social planning.

  • hhpov15 - hhpov19 = each these five column contains the percent of Family Households Living Below the Poverty Line for a given year, from 2015 to 2019.

  • Shape_Area - Shape_Length = standard fields to determine the area and the perimeter of a polygon

poverty_data <- read.csv(file = here::here("data/Percent_of_Family_Households_Living_Below_the_Poverty_Line.csv"))

Source of the data set:
[https://arcg.is/1qOrnH]

2.4 Area Data set

This dataset provides information about the Community Statistical Area geographies for Baltimore City. Based on aggregations of Census tract (2010) geographies. It will serve as a geographical point of reference for us to match each dataset’s observations together. This dataset contains 56 observations, one for each of area. The most relevant variables are the following:

area_data <- read_csv(file = here::here("data/Community_Statistical_Areas__CSAs___Reference_Boundaries.csv"))

Source of the data set:
[https://data.baltimorecity.gov/datasets/community-statistical-area-1/explore?location=39.284605%2C-76.620550%2C12.26]

2.5 Population Data set

This data set provides information about the population in each Community Statistical Area. Information about the total population in 2010 and 2020 are provided. It will be useful to calculate values per capita in each community. The most relevant variables are the following:

  • community = name of the community statistical area. The Baltimore Data Collaborative and the Baltimore City Department of Planning divided Baltimore into 56 CSAs. These 56 units combine Census Bureau geographies together in ways that match Baltimore’s understanding of community boundaries, and are used in social planning.

  • tpop20 = total population in for each Community Statistical Area in 2020

population_data <- read.csv(file = here::here("data/Total_Population.csv"))

Source of the data set:
[https://arcg.is/01eimm0]

2.6 Data Wrangling

2.6.1 Data Wrangling: Area

Here, the main goal is the transformation of the area data set into a new data set, which contains one observation per neighborhood. Indeed, it is important to distinguish neighborhoods which are smaller areas from communities, which are larger and often contain several neighborhoods. We achieve that by first creating a new data set with each neighborhood being assigned to a community using separate_rows and second establishing a new columns with lower case letter for later merge.To do so, we combine the mutate function with tolower which convert the uppercase letters of string to a lowercase string.

area_data2 <- separate_rows(area_data, Neigh, sep = ", ") #Creation of a new data set with each neighborhood being assigned to an area

area_data2 <- mutate(area_data2,neigh=tolower(Neigh)) #Creation of new column with lower case letters

2.6.2 Data Wrangling: Crime

As neighborhood names in the crime data set contain higher case letter, we again use the tolower function to be able to match this data set with the area data set. We then do so by using left_join. Next, we use the anti_join function to understand which observation has not matched. The outcome shows all the neighborhoods which did not match. As shown below, the issues mostly come from spelling difference (e.g.: Mount written Mt.). As we have very few observations which do not match, we change the names manually.

  • mount washington \(→\) Mt. Washington
  • carroll - camden industrial area \(→\) Caroll-Camden Industrial Area
  • patterson park neighborhood \(→\) Patterson Park
  • glenham-belhar \(→\) Glenham-Belford
  • new southwest/mount clare \(→\) Hollins Market
  • mount winans \(→\) Mt.Winans
  • rosemont homeowners/tenants \(→\) Rosemont
  • broening manor \(→\) O’Donnell Heights
  • boyd-booth \(→\) Booth-boyd
  • lower herring run park \(→\) Herring Run Park
  • mt pleasant park \(→\) Mt. Pleasant Park
crime_data <- mutate(crime_data,neigh=tolower(crime_data$Neighborhood)) #Creation of new column with lower case letters

crime_data_with_areas <- crime_data %>% 
  left_join(area_data2,by="neigh") #We create a new data sets that contains the name of the area in which the crime was committed

crime_data_NAs <- crime_data %>% 
  anti_join(area_data2,
            by="neigh") #Here is the list of all the NAs we have

unique(crime_data_NAs$neigh) #We see that we have very few unassigned names, we can change this by hand.

crime_data["neigh"][crime_data["neigh"]=="mount washington"] <- "mt. washington"
crime_data["neigh"][crime_data["neigh"]=="carroll - camden industrial area"] <- "caroll-camden industrial area"
crime_data["neigh"][crime_data["neigh"]=="patterson park neighborhood"] <- "patterson park"
crime_data["neigh"][crime_data["neigh"]=="glenham-belhar"] <- "glenham-belford"
crime_data["neigh"][crime_data["neigh"]=="new southwest/mount clare"] <- "hollins market"
crime_data["neigh"][crime_data["neigh"]=="mount winans"] <- "mt. winans"
crime_data["neigh"][crime_data["neigh"]=="rosemont homeowners/tenants"] <- "rosemont"
crime_data["neigh"][crime_data["neigh"]=="broening manor"] <- "o'donnell heights"
crime_data["neigh"][crime_data["neigh"]=="boyd-booth"] <- "booth-boyd"
crime_data["neigh"][crime_data["neigh"]=="lower herring run park"] <- "herring run park"
crime_data["neigh"][crime_data["neigh"]=="mt pleasant park"] <- "mt. pleasant park"

#We got rid of the 764 remaining observations which had no information about neighbourhood

We get rid of the 764 remaining observations which had no information about neighborhood. This represent a very tiny portion of our total number of observations. Then, we use the semi join function to create the final data sets which in total is basically the same data set as the original one minus the 764 observations that we got rid of.

We also want to get rid of observations dating back before the year 2000 as the the Baltimore CCTV program started in the year 2000. We first check the structure of the data set using the str function. We notice that the CrimeDateTime column is not a date. We change that and finally filter the information we want to keep using filter.

crime_data_with_areas <- crime_data %>% 
 semi_join(area_data2,by="neigh") %>% 
  left_join(area_data2,by="neigh") #Here we have the final data frame with a community for each crime

str(crime_data_with_areas) # We see that the crime CrimeDateTime column is not a date. We thus convert it.

crime_data_with_areas$CrimeDateTime <-  as.Date(crime_data_with_areas$CrimeDateTime)

crime_data_with_areas <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2000-01-01")) #We had 24 observations that dates back to before the year 2000 and 24 observation with no date. We only select crime committed after 2000 as the CCTV program in Baltimore started in 2000.

2.6.3 Data Wrangling: Poverty

56 areas are included in the standard community statistical area system. However, within these 56 statistical areas is also jail included. For the poverty data however, we only have 55 statistical areas provided, as we obviously do not have data about poverty in jail. To solve this inconsistency, we add a new line. Moreover we needed to fill a missing value for South Baltimore in the year 2019: we decided to take the average of the past years as it seemed realtively stable over past years.

poverty_data <- rbind(poverty_data,list(56,"Unassigned -- Jail",0,0,0,0,0,0,0))

poverty_data[48,7] <- c(poverty_data[48,3],poverty_data[48,4],poverty_data[48,5],poverty_data[48,6]) %>% mean() #The poverty level of South Baltimore in 19 was missing. This area's level over the past years seems to be stable (always one of the richest area), that's why we compute the mean of the past 4 years to replace the missing value.

2.6.4 Data Wrangling: CCTV

This data set seems rather tidy, we will mostly use the first two columns which contain information about the location of each CCTV. Therefore, we still need to make sure to not have any missing values in these two columns. We do so by combining the whichand the is.nafunction and by filtering for potential empty observations. By doing so we ensure that we have no NAs neither empty values and so that our data set is tidy.

which(is.na(cctv_data$X))
#> integer(0)
which(is.na(cctv_data$Y))
#> integer(0)
filter(cctv_data, cctv_data$X=="")
#>  [1] X                Y                OBJECTID        
#>  [4] CAM_NUM          NOTES            LOCATION        
#>  [7] PROJ             XCOORD           YCOORD          
#> [10] created_user     created_date     last_edited_user
#> [13] last_edited_date
#> <0 rows> (or 0-length row.names)
filter(cctv_data, cctv_data$Y=="") 
#>  [1] X                Y                OBJECTID        
#>  [4] CAM_NUM          NOTES            LOCATION        
#>  [7] PROJ             XCOORD           YCOORD          
#> [10] created_user     created_date     last_edited_user
#> [13] last_edited_date
#> <0 rows> (or 0-length row.names)

Exploratory data analysis

3.1 Calculation of the density of CCTV per community

The CCTV data set which we retrieved had a slight challenge: although it contained some neighborhood names, most of them were not matching the “standard neighborhood” names. There, to solve that issue we decided to involve geospatial counting.

Our procedure included the following steps. After reading the table and converting the data into a data table, we define what will be the coordinates of the newly created spatial file. Here we have several types of coordinates, we use X and Y which use the EPSG:3857 WGS 84 / Pseudo-Mercator coordinate system. Spatial files must have coordinate systems assigned to them. In the case at hand, we will work with the above mentioned EPSG:3857 WGS 84 / Pseudo-Mercator coordinate system for all the spatial files that we are going to use. Therefore, to ensure consistency, we create a crs object called crs.geo1 that is going to be assigned to all the spatial files we will use. In order to assign a known crs to spatial data, we use the proj4string function, to which we assign crs.geo1.

#read in data table
balt_dat <-  fread(file = here::here("data/Baltimore_CCTV_Locations_Crime_Cameras.csv"))

#convert to data table
balt_dat <- as.data.table(balt_dat)

#make data spatial
coordinates(balt_dat) <-  c("X","Y")
crs.geo1 <-  CRS("+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs +type=crs")
proj4string(balt_dat) <-  crs.geo1  

Then, we can use plotto quickly visualise the cloud of points which represent all the CCTVs in Baltimore.

plot(balt_dat, pch = 20, col = "steelblue") #We can use the plot function to quickly plot the SpatialPointDataFrame that we created. We see a bunch of points which represent the CCTV location in Baltimore.

Next, we have to work with a shapefile object which is another special type of file. It basically is a set of polygons which represents different areas of the city of Baltimore. We downloaded this file on the city’s open data portal. We read it in and assign to this file our crs.geo1 coordinate system. In this way we ensure that our files have the same coordinate system.

#read in shapefile of baltimore
baltimore <-  readOGR(dsn = here::here("data/Community_Statistical_Area"), layer = "Community_Statistical_Area") #name of file and object
proj4string(baltimore) <- crs.geo1

Shape file document:
[https://arcg.is/1T4jiK]

We can now plot these two spatial files together to see the spread of CCTVs over the 56 community statistical areas.

#plot
plot(baltimore,main="Spread of CCTVs in different communities of Baltimore")
plot(balt_dat,pch=20, col="steelblue" , add=TRUE) #If we plot these two lines together, what we obtain is a map of baltimore, we have the 56 community statistical areas and the CCTVs on top of the map.

To illustrate these results numerically, we need R to count for us how many CCTV belongs to which area. Here, the function over counts how many CCTVs are layed over a certain polygon frame. Next, we create a new object called counts and make it into a data frame (so that it is easier for us to work with it). We use sum to ensure that we well and truly have 836 observations which were counted. This is the case so we are happy. Still we notice that we only have 41 rows, meaning that only 41 out of 56 areas contain some CCTVs.

#Perform the count
proj4string(balt_dat)
proj4string(baltimore) #To be able to perform the count, we must ensure that the two spatial files have a similar CRS. This is the case as we attributed these two files "crs.geo1" 

res2 <- over(balt_dat,baltimore) #This function tells you to which community each CCTV belongs to
counts <- table(res2$community)
counts <- as.data.frame(counts)
colnames(counts)[1] <- "Community"
sum(counts$Freq) #We see that we have 836 observation in total, this is a good sign as our initial CCTV data set contained 836 obesrvations

To make that workable, we need to create a new CCTV data frame, from which we just add 0 to each N.A.-location. Lastly, we create a new column with the mutate function to calculate the CCTV density which shows the amount of CCTV per area divided by the total amount of CCTV.

CCTV_per_area <- area_data[2] %>% 
  left_join(counts,by="Community") #One must add the communities where there are no counts i.e no CCTV

CCTV_per_area[is.na(CCTV_per_area)] <- 0

CCTV_per_area <- mutate(CCTV_per_area, density_perc=(CCTV_per_area$Freq/(sum(CCTV_per_area$Freq)))*100)

3.1.1 Mapping of CCTV density

We now want to map CCTV density on the Baltimore map. We first have to use the piping operator to ensure that the community names that we have in the Baltimore data set are the same as the one we are having in the CCTV per area data set. As this only returns true values that means that it works and is good for further analysis.

library(tmap)
baltimore$community %in% CCTV_per_area$Community

Next, we perform a left_join between the Baltimore spatial polygons data frame and the CCTV per area data set. To hedge against the different writing styles (once “community” it is written with a capital letter and one time with a small letter), we use a vector in the end. Finally, we create the map with the tmap package. The tmap package somehow works as the ggplot2 package: First, we need to define a shape, it always starts with the tm_shape argument, and then you can add with the plus operator as many arguments as you wish. We used the Baltimore shape file, filled it with the density percentage, defined some breaks, set the borders and the finally the layout.

baltimore@data <- left_join(baltimore@data, CCTV_per_area, by = c('community' = 'Community'))

CCTV_dens_map <- tm_shape(baltimore) + tm_fill(col = "density_perc", title ="CCTV density", breaks=c(0,1,2,3,4,5,6,7,8,9,10,11)) + tm_borders(col="black",alpha=0.3) + tm_layout(inner.margins = 0.1,main.title = "We tend to have high CCTV density in the city center",title.size =1 ,title.position= c("center","top"), legend.outside=T,legend.outside.position="right")

tmap_mode("plot")
CCTV_dens_map

One of the very first interesting observation we are in position to make is that CCTV density seems to be higher in the city center.

3.2 Calculation of the crime per capita per community

We want to calculate crime per 1000 inhabitants per community. To achieve that we group the crime_data_with_areas data set by community and then use summarize which enables us to compute the crime frequency for each area. Then, using the population data, we can divide the crime frequency by the number of inhabitants in each area. We finally multiply this by 1000 to obtain the crime per 1000 inhabitants as this value is apparently often used when calculating crime statistics. Again, we added one more row in the calculations because we have no values for the prison. To make sure we made no mistake, we add up the CrimeFrequency column to see whether it equals to 349482. This is the case. We can therefore go further confidently. We also create a data frame called Community_data which will contain all the values we compute which concern the communities.

CrimeStatsPerArea <- crime_data_with_areas %>% 
  group_by(Community) %>%
  summarize(CrimeFrequency=n())

CrimeStatsPerArea <-  mutate(CrimeStatsPerArea,CrimePer1000inhabitants=((CrimeStatsPerArea$CrimeFrequency/population_data$tpop20)*1000))

CrimeStatsPerArea <- rbind(CrimeStatsPerArea,list("Unassigned -- Jail",0,0))  #We have no information about crimes committed in jail, yet, the community statistical area encompass 56 area, including jail. In order to ensure consistency, we must add a 56th observation in this data frame.

sum(CrimeStatsPerArea$CrimeFrequency) #The total sum is 349482, which is what we expect

Community_data <- CrimeStatsPerArea[,-2] %>% 
  left_join(CCTV_per_area,by="Community") %>%
  left_join(poverty_data[,c(2,7)],by=c("Community"="CSA2010"))

3.2.1 Mapping of crime per capita per community

We want to map crime per capita per community. The methodology is the same as we did for CCTV density. This time, we use the “quantile” method to create category breaks. We see that crime tends to be higher in the city center.

library(tmap)

baltimore$community %in% CrimeStatsPerArea$Community #We see that we have a perfect match

baltimore@data <- left_join(baltimore@data, CrimeStatsPerArea, by = c('community' = 'Community'))

Crime_per_capita_map <- tm_shape(baltimore) + tm_fill(col = "CrimePer1000inhabitants", title ="Crime (per 1000 inhabitants)",style = "quantile") + tm_borders(col="black",alpha=0.3) + tm_layout(inner.margins = 0.1,title = "Crime tends to be higher in the city center of Balimore",title.size =1 ,title.position= c("center","top"))

tmap_mode("plot")
Crime_per_capita_map

3.2.2 Creation of a distorted map

To observe crime per capita per community distribution in Baltimore visually, we decided to use a distorted map. Again, we use the tmap package together with the cartogram_ncont function which basically distort the map based on intensity of crime per capita in each community. Concretely, we want to show that the crime per capita is higher in the city center, compared to the suburban areas. This can be shown quite neatly graphically.

Distorted_Crime_map <- tm_shape(cartogram_ncont(baltimore, "CrimePer1000inhabitants"))+tm_fill(col = "CrimePer1000inhabitants", title ="Crime (per 1000 inhabitants)",style = "quantile") + tm_borders(col="black",alpha=0.3) + tm_layout(inner.margins = 0.1,main.title = str_wrap(("Crime tends to be higher in the city center of Balimore"),width = 70),title.size =1 ,title.position= c("center","top"), legend.outside=T,legend.outside.position="right") #This map distorts the size of each area depending on their respective crime per capita It is interesting as it enables one to see that higher crime per capita tends to be concentrated in the city center.

tmap_mode("plot")
Distorted_Crime_map

3.2.3 The prison anomaly

One may wonder what this little square with no crime per capita, surrounded by CCTVs in the very center of Baltimore is. It actually is the prison. Zooming on this little square is interesting. In order to create a “sub-map”, we create a smaller area using the st-bbox function. The values indicated in the function represent the most extreme values on the x-axis and y-axis of the map using the EPSG:3857 WGS 84 / Pseudo-Mercator coordinate system. Then, using tm_shape with the new spatial file called Prison_map as argument, we create a map in the same way as we have done before. As we want to be able to locate this smaller area in the picture map of Baltimore, we must also create a Baltimore map with a rectangle representing the newly created “sub-area”. In order to combine these two maps together, we run the two last lines together and use the viewport function. The output is a zoom on the desired area combined with the bigger map having a rectangle over the area which we are looking at and analyzing. The reason why we see so many CCTVs located on the right-hand side of the map is that the main entrance is located there.

tmap_mode("plot")

Prison_area <-  st_bbox(c(xmin = -8529169.92, xmax = -8526465.97,
                      ymin =4764196.55, ymax = 4765056.50),
                    crs = st_crs(baltimore)) %>% st_as_sfc()
 
Prison_map <- tm_shape(Prison_area) + tm_borders(col="black",alpha=0.3)+ tm_shape(baltimore) + tm_fill(col = "CrimePer1000inhabitants", title ="Crime per capita",style = "quantile") + tm_borders(col="black") + tm_layout(inner.margins = 0.05,frame.lwd = 5,main.title = str_wrap(("We have no data about crime committed in prison, this explains this crime-free zone."),width=75),main.title.position = c('left', 'top'),main.title.size = 1)+tm_scale_bar(position = c("left", "top"))+ tm_shape(balt_dat) + tm_dots(col="black") #This map zooms on the prison. This "Area" is special. We have no data on crime there, we can also see that the there is a huge concentration of CCTVs directly next to the prison.


Baltimore_map <- tm_shape(baltimore) + tm_borders()+ tm_shape(Prison_area) + tm_borders(lwd = 3,col = "red") + tm_layout(frame.lwd = 6,inner.margins = 0.05)


Prison_map
print(Baltimore_map, vp = viewport(0.8, 0.27, width = 0.5, height = 0.5)) #By running these two lines together, we obtain 

By creating these maps we are in position to tell that both CCTV density and Crime per capita seem to be higher in the city center. We will investigate on that relationship in the chapter “Analysis”.

3.3 Calculation of crime per capita by type of crime

There exists different ways to categorise crime by type. Crime can for example be catogorised by severity. The law typically consists of three basic classifications of criminal offenses including infractions, misdemeanors, and felonies. Yet crimes can also be categorised by their nature, one usually differentiates property crime from violent crime. We are going to differentiate the crimes contained in the crime_data_with_areas element both in terms of severity and in terms of their nature.

3.3.1 Felonies and Misdemeanors

First thing we do here is to compute the unique values of the “description” column of the crime data set. We see that we have 14 types of crime. In our data set, we have no infractions. The 14 types of crime are divided in this way into the two remaining categories.

  • Misdemeanor: LARCENY FROM AUTO,COMMON ASSAULT, ROBBERY - COMMERCIAL, LARCENY
  • Felony: RAPE, ARSON, HOMICIDE, BURGLARY, AUTO THEFT, ROBBERY - CARJACKING, AGG. ASSAULT, ROBBERY - STREET, ROBBERY - RESIDENCE, SHOOTING
unique(crime_data_with_areas$Description)

#We see that we have 14 types of crime. We want to observe crimes by types, therefore we want to make new classifications.The law consists of three basic classifications of criminal offenses including infractions, misdemeanors, and felonies. In our data set, we have infractions.

#Misdemeanor:LARCENY FROM AUTO,COMMON ASSAULT, ROBBERY - COMMERCIAL, LARCENY
#Felony: RAPE, ARSON, HOMICIDE, BURGLARY, AUTO THEFT, ROBBERY - CARJACKING, AGG. ASSAULT, ROBBERY - STREET, ROBBERY - RESIDENCE, SHOOTING

Next we create a data set called crime_cat which basically tells which recorded crime type belongs to which crime type. This data set will be used to make a left joint with the crime_data_per_area. Finally, we add to our initial crime_data_with_areas element an extra column indicating whether the crime committed is a felony or a misdemeanor.

crime_cat <- data.frame(Category=c("Misdemeanor","Felony"), Description=c(c("LARCENY FROM AUTO,COMMON ASSAULT,ROBBERY - COMMERCIAL,LARCENY"),c("RAPE,ARSON,HOMICIDE,BURGLARY,AUTO THEFT,ROBBERY - CARJACKING,AGG. ASSAULT,ROBBERY - STREET,ROBBERY - RESIDENCE,SHOOTING")))

crime_cat <- separate_rows(crime_cat, Description, sep = ",")

crime_cat$Description %in% unique(crime_data_with_areas$Description) #Ensure we have a perfect match

crime_data_with_areas <- crime_data_with_areas %>% 
  left_join(crime_cat,by="Description") #We had a new variable to our crime data set

Finally, we compute the crime statistics per type, per area. Here, we again use the piping operator but this time, we group by both the community and the category. Again, we check that we indeed have 349482 observations. Afterward, we compute both felony and misdemeanor per capita in each community and (again) add the prison line into the newly created data sets.

CrimePerCategoryPerArea <- crime_data_with_areas %>% 
  group_by(Community,Category) %>%
  summarize(RepartitionPerCategoryPerArea=n())

sum(CrimePerCategoryPerArea$RepartitionPerCategoryPerArea) #Again, we check that we indeed have 349482 observations

CrimeCategoryRepartition <- CrimePerCategoryPerArea %>% 
  group_by(Category) %>% 
  summarise(Repartition=sum(RepartitionPerCategoryPerArea)) #We observe that in Baltimore, the number of felony is close to the number of misdemeanor

FelonyStats <-  CrimePerCategoryPerArea %>% filter(Category=="Felony") 

FelonyStats$FelonyPerCapitaPerArea <-((CrimePerCategoryPerArea%>% filter(Category=="Felony"))[[3]]/population_data$tpop20)*1000

FelonyStats[56,] <- list("Unassigned -- Jail","Felony",0,0)

MisdemeanorStats <-  CrimePerCategoryPerArea %>% filter(Category=="Misdemeanor") 

MisdemeanorStats$MisdemeanorPerCapitaPerArea <-((CrimePerCategoryPerArea%>% filter(Category=="Misdemeanor"))[[3]]/population_data$tpop20)*1000

MisdemeanorStats[56,] <- list("Unassigned -- Jail","Misdemeanor",0,0)

Community_data <- Community_data %>% 
  left_join(FelonyStats[,-c(2:3)],by="Community") %>%
  left_join(MisdemeanorStats[,-c(2:3)],by="Community")

3.3.2 Mapping of felonies and Misdemeanors

After ensuring that we have a perfect match we perform a left joint for felony and misdemeanor with the Baltimore object and map everything side by side using tmap_arrange.

#Felony

baltimore$community %in% FelonyStats$Community

baltimore@data <- left_join(baltimore@data, FelonyStats, by = c('community' = 'Community'))

Felony_map <- tm_shape(baltimore) + tm_fill(col = "FelonyPerCapitaPerArea", title ="Felony (per 1000 inhabitants)",style = "quantile") + tm_borders(col="black",alpha=0.3) + tm_layout(inner.margins = 0.1,title="Felony tends to be more located in the western part of the city",title.size =1 ,title.position= c("center","top"))

#Misdemeanor

baltimore$community %in% MisdemeanorStats$Community

baltimore@data <- left_join(baltimore@data, MisdemeanorStats, by = c('community' = 'Community'))

Misdemeanor_map <- tm_shape(baltimore) + tm_fill(col = "MisdemeanorPerCapitaPerArea", title ="Misdemeanor (per 1000 inhabitants)",style = "quantile") + tm_borders(col="black",alpha=0.3) + tm_layout(inner.margins = 0.1,title="Misdemeanor tends to be more located in the eastern part of the city",title.size =1 ,title.position= c("center","top"))

tmap_arrange(Felony_map,Misdemeanor_map)


There we tend to observe some slight differences compared to overall crime. Indeed, we see that more severe crime tend to be slightly more concentrated in the western part of the city while less severe crimes are slightly more concentrated in the eastern part of Baltimore.


3.3.3 Felonies VS Misdemeanors - Do we have an equal crime type distribution?

It is always interesting to see whether we can spot patterns in crime data. The idea here is to analyse whether we tend to observe an equal distribution of felony and misdemeanors in each area. By computing a simple linear regression, we see that the two types of crime seems rather equally distributed in each area as we have a decent \(R^2\) of 62.2%. Still, it is interesting to observe that the biggest outlier on the scatter plot is Downtown/Seton Hill. In Downtown, misdemeanor per capita is much larger than the felony per capita We don’t know whether this finding is relevant, yet, it must be mentioned that this area also is one of the richest area in Baltimore.

Felony_VS_Misdemeanor <- FelonyStats %>% 
  left_join(MisdemeanorStats,by="Community")

regression4 <- lm(Felony_VS_Misdemeanor$MisdemeanorPerCapitaPerArea~Felony_VS_Misdemeanor$FelonyPerCapitaPerArea)
#This allows us to see whether Felony and Misdemeanors are correlated. This seems to be the case
Felony VS Misdemeanor
Dependent variable:
Felony (per 1000 inhabitants)
Misdemeanor (per 1000 inhabitants) 1.020***
(0.107)
intercept 42.000
(34.700)
Observations 56
R2 0.629
Adjusted R2 0.622
Residual Std. Error 119.000 (df = 54)
F Statistic 91.700*** (df = 1; 54)
Note: p<0.1; p<0.05; p<0.01



Downtown_label <- Felony_VS_Misdemeanor[14,]

ggplot(data=Felony_VS_Misdemeanor,mapping= aes(x=FelonyPerCapitaPerArea,y=MisdemeanorPerCapitaPerArea)) + 
  labs(title = str_wrap(("Misdemeanors and felonies are rather equally distributed in Baltimore"),width=65), x="Felony (per 1000 inhabitants)",y="Misdemeanor (per 1000 inhabitants)")+
  geom_point(data=Downtown_label) +
  ggrepel::geom_label_repel(aes(label=Community),data = Downtown_label,min.segment.length = 0.5)+
  geom_point() + 
  geom_smooth(method = lm,color="blue",size=0.3)

3.3.4 Violent crime and Property crime

As mentioned earlier, it is also possible to divide the crimes committed in Baltimore by their nature. A distinction is generally made between property crime and violent crime. In a property crime, a victim’s property is stolen or destroyed, without the use or threat of force against the victim. Property crimes include burglary and theft as well as vandalism and arson. In a violent crime, a victim is harmed by or threatened with violence. Violent crimes include rape and sexual assault, robbery, assault and murder.

In order determine whether the crimes contained in our crime_data_with_area is a violent or a property crime, we will use a data set once again provided by the Baltimore open data portal. This data set provides information about the crime codes used by the police to categorize crimes. We first import the data set. Then, we compare whether codes are well and truly similar, three crime codes are written with an extra blank space afterward. We correct that. Then, using the left_join function, we add a new column to our crime_data_with_area data frame. We then wish to create data frames for both violent and property crime. The methodology is the same as we used for felonies and misdemeanors.

Extra data set:
[https://data.baltimorecity.gov/documents/crime-codes/about]

crimecode_data <- read.csv(file = here::here("data/Balt_CRIME_CODES.csv"))

unique(crime_data_with_areas$CrimeCode) %in% unique(crimecode_data$CODE) #We identify spelling errors

crimecode_data$CODE[185] <- "8H"
crimecode_data$CODE[186] <- "8I"
crimecode_data$CODE[187] <- "8J"

crime_data_with_areas <- crime_data_with_areas %>% 
  left_join(crimecode_data[,c(1,8)],by=c("CrimeCode"="CODE"))

unique(crime_data_with_areas$VIO_PROP_CFS)
which(is.na(crime_data_with_areas$VIO_PROP_CFS)) #We ensure that we have no NAs

CrimePerCategory2PerArea <- crime_data_with_areas %>% 
  group_by(Community,VIO_PROP_CFS) %>%
  summarize(RepartitionPerCategory2PerArea=n())

sum(CrimePerCategory2PerArea$RepartitionPerCategory2PerArea) #Again, we check that we indeed have 349482 observations

CrimeCategory2Repartition <- CrimePerCategory2PerArea %>% 
  group_by(VIO_PROP_CFS) %>% 
  summarise(Repartition=sum(RepartitionPerCategory2PerArea))

PropertyStats <-  CrimePerCategory2PerArea %>% filter(VIO_PROP_CFS=="PROPERTY") 

PropertyStats$PropertyCrimePerCapitaPerArea <-((CrimePerCategory2PerArea%>% filter(VIO_PROP_CFS=="PROPERTY"))[[3]]/population_data$tpop20)*1000

PropertyStats[56,] <- list("Unassigned -- Jail","PROPERTY",0,0)

ViolentStats <-  CrimePerCategory2PerArea %>% filter(VIO_PROP_CFS=="VIOLENT") 

ViolentStats$ViolentCrimePerCapitaPerArea <-((CrimePerCategory2PerArea%>% filter(VIO_PROP_CFS=="VIOLENT"))[[3]]/population_data$tpop20)*1000

ViolentStats[56,] <- list("Unassigned -- Jail","PROPERTY",0,0)

Community_data <- Community_data %>% 
  left_join(ViolentStats[,c(1,4)],by="Community") %>% 
  left_join(PropertyStats[,c(1,4)],by="Community")

3.3.2 Mapping of violent and property crime

After ensuring that we have a perfect match we perform a left joint for violent crime and property crime with the Baltimore object and map everything side by side using tmap_arrange.

#Violent Crime

baltimore$community %in% ViolentStats$Community

baltimore@data <- left_join(baltimore@data, ViolentStats[,c(1,4)], by = c('community' = 'Community'))

Violent_Crime_map <- tm_shape(baltimore) + tm_fill(col = "ViolentCrimePerCapitaPerArea", title ="Violent crime (per 1000 inhabitants)",style = "quantile") + tm_borders(col="black",alpha=0.3) + tm_layout(inner.margins = 0.1,title="Violent crime tends to be more located in the western part of the city",title.size =1 ,title.position= c("center","top"))

#Property Crime

baltimore$community %in% PropertyStats$Community

baltimore@data <- left_join(baltimore@data, PropertyStats[,c(1,4)], by = c('community' = 'Community'))

Property_Crime_map <- tm_shape(baltimore) + tm_fill(col = "PropertyCrimePerCapitaPerArea", title ="Property crime (per 1000 inhabitants)",style = "quantile") + tm_borders(col="black",alpha=0.3) + tm_layout(inner.margins = 0.1,title="Property crime tends to be more located in the eastern part of the city",title.size =1 ,title.position= c("center","top"))

tmap_mode("plot")
tmap_arrange(Violent_Crime_map,Property_Crime_map)

Interestingly, we also observe a difference in terms of distribution of crime by their nature compared to crime overall. Like felony and misdemeanor, violent crime tends to be more severe in the western part of the city while property crime is more severe in the estern part of Baltimore

3.4 Calculation of crime evolution

The idea is that we want to get information about how crime evolves. We therefore have created a data set for each year. The results are interesting. If we compare how many observations we have in each crime-per year data set, we see that we have ~40.000ish cases a year except from 2020 (which is certainly due to COVID) and the year 2021 (which is not finished. We don’t make any datasets for the year 2013 and below, because we see that we have not many observations which date prior to the year 2013. The graph represent the monthly evolution of crime for each year. We see that there seems to be a sort of pattern and that, each year, crime increases mid-year before decreasing in winter. The effect of colder temperatures and snow on crime in Baltimore is known by the Baltimore Police Department who yet admits that “snow’s effect on crime can be hard to predict” as its effect might depend on the type of crime. Domestic violence for example often increase as temperature decreases.

Interesting article on the effect of snow on crime in Baltimore:
[https://www.baltimoresun.com/maryland/baltimore-city/bal-md.ci.snowcrime09feb09-story.html]

Crime_in_2021 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2021-01-01") & CrimeDateTime <= as.Date("2021-12-31"))

Crime_in_2020 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2020-01-01") & CrimeDateTime <= as.Date("2020-12-31"))

Crime_in_2019 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2019-01-01") & CrimeDateTime <= as.Date("2019-12-31"))

Crime_in_2018 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2018-01-01") & CrimeDateTime <= as.Date("2018-12-31"))

Crime_in_2017 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2017-01-01") & CrimeDateTime <= as.Date("2017-12-31"))

Crime_in_2016 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2016-01-01") & CrimeDateTime <= as.Date("2016-12-31"))

Crime_in_2015 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2015-01-01") & CrimeDateTime <= as.Date("2015-12-31"))

Crime_in_2014 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2014-01-01") & CrimeDateTime <= as.Date("2014-12-31"))

crime_data_with_areas %>%  filter(CrimeDateTime < as.Date("2014-01-01")) #We see that we have very few (76) observations before 2014, thus we do not consider them

Crime_Monthly_evolution_map <- crime_data_with_areas %>% 
  count(month=floor_date(CrimeDateTime,"month")) %>% 
  ggplot(aes(month,n))+geom_line()+
  scale_x_date(limits = c(as.Date("2014-01-01"), as.Date("2021-08-31")))+
  labs(title = "Crime seasonality observation: crime tends to decrease in winter.",x="Year",y="Crime occurences")
#This enables us to see how crime evolve, month after month

Crime_Monthly_evolution_map

Next, we calculate the crime per capita for each year using the piping operator, grouping by community and summarize the frequencies. In the end we create the crime evolution data sets which is a combination of all the data.

#_____ Calculations of the crime rates

CrimePerCapitaPerArea2021 <- Crime_in_2021 %>% 
  group_by(Community) %>%
  summarize(CrimeFrequency21=n())

CrimePerCapitaPerArea2021 <-  mutate(CrimePerCapitaPerArea2021,CrimePer1000inhabitants21=((CrimePerCapitaPerArea2021$CrimeFrequency21/population_data$tpop20)*1000))

CrimePerCapitaPerArea2021 <- rbind(CrimePerCapitaPerArea2021,list("Unassigned -- Jail",0,0))

CrimePerCapitaPerArea2020 <- Crime_in_2020 %>% 
  group_by(Community) %>%
  summarize(CrimeFrequency20=n())

CrimePerCapitaPerArea2020 <-  mutate(CrimePerCapitaPerArea2020,CrimePer1000inhabitants20=((CrimePerCapitaPerArea2020$CrimeFrequency20/population_data$tpop20)*1000))

CrimePerCapitaPerArea2020 <- rbind(CrimePerCapitaPerArea2020,list("Unassigned -- Jail",0,0))

CrimePerCapitaPerArea2019 <- Crime_in_2019 %>% 
  group_by(Community) %>%
  summarize(CrimeFrequency19=n())

CrimePerCapitaPerArea2019 <-  mutate(CrimePerCapitaPerArea2019,CrimePer1000inhabitants19=((CrimePerCapitaPerArea2019$CrimeFrequency19/population_data$tpop20)*1000))

CrimePerCapitaPerArea2019 <- rbind(CrimePerCapitaPerArea2019,list("Unassigned -- Jail",0,0))

CrimePerCapitaPerArea2018 <- Crime_in_2018 %>% 
  group_by(Community) %>%
  summarize(CrimeFrequency18=n())

CrimePerCapitaPerArea2018 <-  mutate(CrimePerCapitaPerArea2018,CrimePer1000inhabitants18=((CrimePerCapitaPerArea2018$CrimeFrequency18/population_data$tpop20)*1000))

CrimePerCapitaPerArea2018 <- rbind(CrimePerCapitaPerArea2018,list("Unassigned -- Jail",0,0))

CrimePerCapitaPerArea2017 <- Crime_in_2017 %>% 
  group_by(Community) %>%
  summarize(CrimeFrequency17=n())

CrimePerCapitaPerArea2017 <-  mutate(CrimePerCapitaPerArea2017,CrimePer1000inhabitants17=((CrimePerCapitaPerArea2017$CrimeFrequency17/population_data$tpop20)*1000))

CrimePerCapitaPerArea2017 <- rbind(CrimePerCapitaPerArea2017,list("Unassigned -- Jail",0,0))

CrimePerCapitaPerArea2016 <- Crime_in_2016 %>% 
  group_by(Community) %>%
  summarize(CrimeFrequency16=n())

CrimePerCapitaPerArea2016 <-  mutate(CrimePerCapitaPerArea2016,CrimePer1000inhabitants16=((CrimePerCapitaPerArea2016$CrimeFrequency16/population_data$tpop20)*1000))

CrimePerCapitaPerArea2016 <- rbind(CrimePerCapitaPerArea2016,list("Unassigned -- Jail",0,0))

CrimePerCapitaPerArea2015 <- Crime_in_2015 %>% 
  group_by(Community) %>%
  summarize(CrimeFrequency15=n())

CrimePerCapitaPerArea2015 <-  mutate(CrimePerCapitaPerArea2015,CrimePer1000inhabitants15=((CrimePerCapitaPerArea2015$CrimeFrequency15/population_data$tpop20)*1000))

CrimePerCapitaPerArea2015 <- rbind(CrimePerCapitaPerArea2015,list("Unassigned -- Jail",0,0))

CrimePerCapitaPerArea2014 <- Crime_in_2014 %>% 
  group_by(Community) %>%
  summarize(CrimeFrequency14=n())

CrimePerCapitaPerArea2014 <-  mutate(CrimePerCapitaPerArea2014,CrimePer1000inhabitants14=((CrimePerCapitaPerArea2014$CrimeFrequency14/population_data$tpop20)*1000))

CrimePerCapitaPerArea2014 <- rbind(CrimePerCapitaPerArea2014,list("Unassigned -- Jail",0,0))

crime_evolution <- CrimePerCapitaPerArea2021 %>% 
  left_join(CrimePerCapitaPerArea2020,by="Community") %>% 
  left_join(CrimePerCapitaPerArea2019,by="Community") %>%
  left_join(CrimePerCapitaPerArea2018,by="Community") %>%
  left_join(CrimePerCapitaPerArea2017,by="Community") %>% 
  left_join(CrimePerCapitaPerArea2016,by="Community") %>% 
  left_join(CrimePerCapitaPerArea2015,by="Community") %>% 
  left_join(CrimePerCapitaPerArea2014,by="Community")

Community_data <- Community_data %>% 
  left_join(crime_evolution,by="Community")

Another interesting way to visualise how crime evolved is by using an animated map. Indeed, this allows one to see the respective evolution of crime in each communtiy. We can create animated maps using the tmap_animation function and the tm_facets argument while building the map. Yet, in order to be in position to use it, we have to create a very particular tibble. In the case at hand, we want our animated map to display crime per capita evolution over 7 years (from 2014 to 2020, we get ride of 2021 as the year is not complete). Therefore, we must have 7 x 56 observations, one crime per capita value for each year, for each 56 area. Yet, the tibble becomes a bit more peculiar as for each observation, we have to add in a separate column, a polygon (which is an S4 element) corresponding to the area in question. It is not possible to use a function like the rep function to replicate S4 elements, therefore, we had to do that manually.

Once the tibble is built, we want to merge the data contained in it in a SpatialPolygonsDataFrame. We want to use the baltimore SpatialPolygonsDataFrame.However, as the tibble contains 392 observations, this will enlarge our our SpatialPolygonsDataFrame. As the baltimore object is also used for other purposes, we create an alias. Then, we merge the newly created tibble with the newly created alias, simply using left_join. We create the bbox object as well as an object called pb. The first element allows us to delimit the geographical area of interest and the second allows us to create custom classes. Finally, we crate a map using the tm_shape function. We finally create a gif using tmap_animation.

anim_tibble <-  tibble(Year=rep(2020:2014,56),Community=rep(Community_data$Community,each=7),CrimeRate=as.vector(t(crime_evolution[,-c(1,2,3,4,6,8,10,12,14,16)])),geometry=list(
  baltimore@polygons[[1]],baltimore@polygons[[1]],baltimore@polygons[[1]],baltimore@polygons[[1]],baltimore@polygons[[1]],baltimore@polygons[[1]],baltimore@polygons[[1]],
  baltimore@polygons[[2]],baltimore@polygons[[2]],baltimore@polygons[[2]],baltimore@polygons[[2]],baltimore@polygons[[2]],baltimore@polygons[[2]],baltimore@polygons[[2]],
  baltimore@polygons[[3]],baltimore@polygons[[3]],baltimore@polygons[[3]],baltimore@polygons[[3]],baltimore@polygons[[3]],baltimore@polygons[[3]],baltimore@polygons[[3]],
  baltimore@polygons[[4]],baltimore@polygons[[4]],baltimore@polygons[[4]],baltimore@polygons[[4]],baltimore@polygons[[4]],baltimore@polygons[[4]],baltimore@polygons[[4]],
  baltimore@polygons[[5]],baltimore@polygons[[5]],baltimore@polygons[[5]],baltimore@polygons[[5]],baltimore@polygons[[5]],baltimore@polygons[[5]],baltimore@polygons[[5]],
  baltimore@polygons[[6]],baltimore@polygons[[6]],baltimore@polygons[[6]],baltimore@polygons[[6]],baltimore@polygons[[6]],baltimore@polygons[[6]],baltimore@polygons[[6]],
  baltimore@polygons[[7]],baltimore@polygons[[7]],baltimore@polygons[[7]],baltimore@polygons[[7]],baltimore@polygons[[7]],baltimore@polygons[[7]],baltimore@polygons[[7]],
  baltimore@polygons[[8]],baltimore@polygons[[8]],baltimore@polygons[[8]],baltimore@polygons[[8]],baltimore@polygons[[8]],baltimore@polygons[[8]],baltimore@polygons[[8]],
  baltimore@polygons[[9]],baltimore@polygons[[9]],baltimore@polygons[[9]],baltimore@polygons[[9]],baltimore@polygons[[9]],baltimore@polygons[[9]],baltimore@polygons[[9]],
  baltimore@polygons[[10]],baltimore@polygons[[10]],baltimore@polygons[[10]],baltimore@polygons[[10]],baltimore@polygons[[10]],baltimore@polygons[[10]],baltimore@polygons[[10]],
  baltimore@polygons[[11]],baltimore@polygons[[11]],baltimore@polygons[[11]],baltimore@polygons[[11]],baltimore@polygons[[11]],baltimore@polygons[[11]],baltimore@polygons[[11]],
  baltimore@polygons[[12]],baltimore@polygons[[12]],baltimore@polygons[[12]],baltimore@polygons[[12]],baltimore@polygons[[12]],baltimore@polygons[[12]],baltimore@polygons[[12]],
  baltimore@polygons[[13]],baltimore@polygons[[13]],baltimore@polygons[[13]],baltimore@polygons[[13]],baltimore@polygons[[13]],baltimore@polygons[[13]],baltimore@polygons[[13]],
  baltimore@polygons[[14]],baltimore@polygons[[14]],baltimore@polygons[[14]],baltimore@polygons[[14]],baltimore@polygons[[14]],baltimore@polygons[[14]],baltimore@polygons[[14]],
  baltimore@polygons[[15]],baltimore@polygons[[15]],baltimore@polygons[[15]],baltimore@polygons[[15]],baltimore@polygons[[15]],baltimore@polygons[[15]],baltimore@polygons[[15]],
  baltimore@polygons[[16]],baltimore@polygons[[16]],baltimore@polygons[[16]],baltimore@polygons[[16]],baltimore@polygons[[16]],baltimore@polygons[[16]],baltimore@polygons[[16]],
  baltimore@polygons[[17]],baltimore@polygons[[17]],baltimore@polygons[[17]],baltimore@polygons[[17]],baltimore@polygons[[17]],baltimore@polygons[[17]],baltimore@polygons[[17]],
  baltimore@polygons[[18]],baltimore@polygons[[18]],baltimore@polygons[[18]],baltimore@polygons[[18]],baltimore@polygons[[18]],baltimore@polygons[[18]],baltimore@polygons[[18]],
  baltimore@polygons[[19]],baltimore@polygons[[19]],baltimore@polygons[[19]],baltimore@polygons[[19]],baltimore@polygons[[19]],baltimore@polygons[[19]],baltimore@polygons[[19]],
  baltimore@polygons[[20]],baltimore@polygons[[20]],baltimore@polygons[[20]],baltimore@polygons[[20]],baltimore@polygons[[20]],baltimore@polygons[[20]],baltimore@polygons[[20]],
  baltimore@polygons[[21]],baltimore@polygons[[21]],baltimore@polygons[[21]],baltimore@polygons[[21]],baltimore@polygons[[21]],baltimore@polygons[[21]],baltimore@polygons[[21]],
  baltimore@polygons[[22]],baltimore@polygons[[22]],baltimore@polygons[[22]],baltimore@polygons[[22]],baltimore@polygons[[22]],baltimore@polygons[[22]],baltimore@polygons[[22]],
  baltimore@polygons[[23]],baltimore@polygons[[23]],baltimore@polygons[[23]],baltimore@polygons[[23]],baltimore@polygons[[23]],baltimore@polygons[[23]],baltimore@polygons[[23]],
  baltimore@polygons[[24]],baltimore@polygons[[24]],baltimore@polygons[[24]],baltimore@polygons[[24]],baltimore@polygons[[24]],baltimore@polygons[[24]],baltimore@polygons[[24]],
  baltimore@polygons[[25]],baltimore@polygons[[25]],baltimore@polygons[[25]],baltimore@polygons[[25]],baltimore@polygons[[25]],baltimore@polygons[[25]],baltimore@polygons[[25]],
  baltimore@polygons[[26]],baltimore@polygons[[26]],baltimore@polygons[[26]],baltimore@polygons[[26]],baltimore@polygons[[26]],baltimore@polygons[[26]],baltimore@polygons[[26]],
  baltimore@polygons[[27]],baltimore@polygons[[27]],baltimore@polygons[[27]],baltimore@polygons[[27]],baltimore@polygons[[27]],baltimore@polygons[[27]],baltimore@polygons[[27]],
  baltimore@polygons[[28]],baltimore@polygons[[28]],baltimore@polygons[[28]],baltimore@polygons[[28]],baltimore@polygons[[28]],baltimore@polygons[[28]],baltimore@polygons[[28]],
  baltimore@polygons[[29]],baltimore@polygons[[29]],baltimore@polygons[[29]],baltimore@polygons[[29]],baltimore@polygons[[29]],baltimore@polygons[[29]],baltimore@polygons[[29]],
  baltimore@polygons[[30]],baltimore@polygons[[30]],baltimore@polygons[[30]],baltimore@polygons[[30]],baltimore@polygons[[30]],baltimore@polygons[[30]],baltimore@polygons[[30]],
  baltimore@polygons[[31]],baltimore@polygons[[31]],baltimore@polygons[[31]],baltimore@polygons[[31]],baltimore@polygons[[31]],baltimore@polygons[[31]],baltimore@polygons[[31]],
  baltimore@polygons[[32]],baltimore@polygons[[32]],baltimore@polygons[[32]],baltimore@polygons[[32]],baltimore@polygons[[32]],baltimore@polygons[[32]],baltimore@polygons[[32]],
  baltimore@polygons[[33]],baltimore@polygons[[33]],baltimore@polygons[[33]],baltimore@polygons[[33]],baltimore@polygons[[33]],baltimore@polygons[[33]],baltimore@polygons[[33]],
  baltimore@polygons[[34]],baltimore@polygons[[34]],baltimore@polygons[[34]],baltimore@polygons[[34]],baltimore@polygons[[34]],baltimore@polygons[[34]],baltimore@polygons[[34]],
  baltimore@polygons[[35]],baltimore@polygons[[35]],baltimore@polygons[[35]],baltimore@polygons[[35]],baltimore@polygons[[35]],baltimore@polygons[[35]],baltimore@polygons[[35]],
  baltimore@polygons[[36]],baltimore@polygons[[36]],baltimore@polygons[[36]],baltimore@polygons[[36]],baltimore@polygons[[36]],baltimore@polygons[[36]],baltimore@polygons[[36]],
  baltimore@polygons[[37]],baltimore@polygons[[37]],baltimore@polygons[[37]],baltimore@polygons[[37]],baltimore@polygons[[37]],baltimore@polygons[[37]],baltimore@polygons[[37]],
  baltimore@polygons[[38]],baltimore@polygons[[38]],baltimore@polygons[[38]],baltimore@polygons[[38]],baltimore@polygons[[38]],baltimore@polygons[[38]],baltimore@polygons[[38]],
  baltimore@polygons[[39]],baltimore@polygons[[39]],baltimore@polygons[[39]],baltimore@polygons[[39]],baltimore@polygons[[39]],baltimore@polygons[[39]],baltimore@polygons[[39]],
  baltimore@polygons[[40]],baltimore@polygons[[40]],baltimore@polygons[[40]],baltimore@polygons[[40]],baltimore@polygons[[40]],baltimore@polygons[[40]],baltimore@polygons[[40]],
  baltimore@polygons[[41]],baltimore@polygons[[41]],baltimore@polygons[[41]],baltimore@polygons[[41]],baltimore@polygons[[41]],baltimore@polygons[[41]],baltimore@polygons[[41]],
  baltimore@polygons[[42]],baltimore@polygons[[42]],baltimore@polygons[[42]],baltimore@polygons[[42]],baltimore@polygons[[42]],baltimore@polygons[[42]],baltimore@polygons[[42]],
  baltimore@polygons[[43]],baltimore@polygons[[43]],baltimore@polygons[[43]],baltimore@polygons[[43]],baltimore@polygons[[43]],baltimore@polygons[[43]],baltimore@polygons[[43]],
  baltimore@polygons[[44]],baltimore@polygons[[44]],baltimore@polygons[[44]],baltimore@polygons[[44]],baltimore@polygons[[44]],baltimore@polygons[[44]],baltimore@polygons[[44]],
  baltimore@polygons[[45]],baltimore@polygons[[45]],baltimore@polygons[[45]],baltimore@polygons[[45]],baltimore@polygons[[45]],baltimore@polygons[[45]],baltimore@polygons[[45]],
  baltimore@polygons[[46]],baltimore@polygons[[46]],baltimore@polygons[[46]],baltimore@polygons[[46]],baltimore@polygons[[46]],baltimore@polygons[[46]],baltimore@polygons[[46]],
  baltimore@polygons[[47]],baltimore@polygons[[47]],baltimore@polygons[[47]],baltimore@polygons[[47]],baltimore@polygons[[47]],baltimore@polygons[[47]],baltimore@polygons[[47]],
  baltimore@polygons[[48]],baltimore@polygons[[48]],baltimore@polygons[[48]],baltimore@polygons[[48]],baltimore@polygons[[48]],baltimore@polygons[[48]],baltimore@polygons[[48]],
  baltimore@polygons[[49]],baltimore@polygons[[49]],baltimore@polygons[[49]],baltimore@polygons[[49]],baltimore@polygons[[49]],baltimore@polygons[[49]],baltimore@polygons[[49]],
  baltimore@polygons[[50]],baltimore@polygons[[50]],baltimore@polygons[[50]],baltimore@polygons[[50]],baltimore@polygons[[50]],baltimore@polygons[[50]],baltimore@polygons[[50]],
  baltimore@polygons[[51]],baltimore@polygons[[51]],baltimore@polygons[[51]],baltimore@polygons[[51]],baltimore@polygons[[51]],baltimore@polygons[[51]],baltimore@polygons[[51]],
  baltimore@polygons[[52]],baltimore@polygons[[52]],baltimore@polygons[[52]],baltimore@polygons[[52]],baltimore@polygons[[52]],baltimore@polygons[[52]],baltimore@polygons[[52]],
  baltimore@polygons[[53]],baltimore@polygons[[53]],baltimore@polygons[[53]],baltimore@polygons[[53]],baltimore@polygons[[53]],baltimore@polygons[[53]],baltimore@polygons[[53]],
  baltimore@polygons[[54]],baltimore@polygons[[54]],baltimore@polygons[[54]],baltimore@polygons[[54]],baltimore@polygons[[54]],baltimore@polygons[[54]],baltimore@polygons[[54]],
  baltimore@polygons[[55]],baltimore@polygons[[55]],baltimore@polygons[[55]],baltimore@polygons[[55]],baltimore@polygons[[55]],baltimore@polygons[[55]],baltimore@polygons[[55]],
  baltimore@polygons[[56]],baltimore@polygons[[56]],baltimore@polygons[[56]],baltimore@polygons[[56]],baltimore@polygons[[56]],baltimore@polygons[[56]],baltimore@polygons[[56]]))

baltimore_alias <- baltimore

baltimore_alias@polygons <- anim_tibble$geometry

baltimore_alias@data$community %in% anim_tibble$Community #Again, we ensure that we have a perfect match

baltimore_alias@data <-left_join(baltimore_alias@data,anim_tibble,by = c('community' = 'Community'))

bbox <- baltimore@bbox
pb <-  c(0,25,50,75,100,125,150,175,200,225,250)

animated_crime_map <- tm_shape(baltimore_alias,bbox = bbox, projection = crs.geo1) +
  tm_polygons("CrimeRate",breaks=pb,title ="Crime") +
  tm_facets(free.scales.fill = F,along = "Year")+tm_shape(baltimore)+tm_borders()

tmap_animation(animated_crime_map, filename ="animated_crime_map.gif", delay=85)

We can see that crime first increases in most areas before decreasing as from 2018

We see that crime peaks in 2017, before decreasing in most areas, except areas in the city center such as Downtown/Seton Hill.

#### 3.4.1 Calculation of violent crime and property crime evolution

We can make the exact same computations as before to calculate violent crime and property crime evolution.

Violent_Crime_in_2021 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2021-01-01") & CrimeDateTime <= as.Date("2021-12-31")) %>% filter(VIO_PROP_CFS=="VIOLENT")

Violent_Crime_in_2020 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2020-01-01") & CrimeDateTime <= as.Date("2020-12-31")) %>% filter(VIO_PROP_CFS=="VIOLENT")

Violent_Crime_in_2019 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2019-01-01") & CrimeDateTime <= as.Date("2019-12-31")) %>% filter(VIO_PROP_CFS=="VIOLENT")

Violent_Crime_in_2018 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2018-01-01") & CrimeDateTime <= as.Date("2018-12-31")) %>% filter(VIO_PROP_CFS=="VIOLENT")

Violent_Crime_in_2017 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2017-01-01") & CrimeDateTime <= as.Date("2017-12-31")) %>% filter(VIO_PROP_CFS=="VIOLENT")

Violent_Crime_in_2016 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2016-01-01") & CrimeDateTime <= as.Date("2016-12-31")) %>% filter(VIO_PROP_CFS=="VIOLENT")

Violent_Crime_in_2015 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2015-01-01") & CrimeDateTime <= as.Date("2015-12-31")) %>% filter(VIO_PROP_CFS=="VIOLENT")

Violent_Crime_in_2014 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2014-01-01") & CrimeDateTime <= as.Date("2014-12-31")) %>% filter(VIO_PROP_CFS=="VIOLENT")

ViolentCrimePerCapitaPerArea2021 <- Violent_Crime_in_2021 %>% 
  group_by(Community) %>%
  summarize(ViolentCrimeFrequency21=n())

ViolentCrimePerCapitaPerArea2021 <-  mutate(ViolentCrimePerCapitaPerArea2021,ViolentCrimePer1000inhabitants21=((ViolentCrimePerCapitaPerArea2021$ViolentCrimeFrequency21/population_data$tpop20)*1000))

ViolentCrimePerCapitaPerArea2021 <- rbind(ViolentCrimePerCapitaPerArea2021,list("Unassigned -- Jail",0,0))

ViolentCrimePerCapitaPerArea2020 <- Violent_Crime_in_2020 %>% 
  group_by(Community) %>%
  summarize(ViolentCrimeFrequency20=n())

ViolentCrimePerCapitaPerArea2020 <-  mutate(ViolentCrimePerCapitaPerArea2020,ViolentCrimePer1000inhabitants20=((ViolentCrimePerCapitaPerArea2020$ViolentCrimeFrequency20/population_data$tpop20)*1000))

ViolentCrimePerCapitaPerArea2020 <- rbind(ViolentCrimePerCapitaPerArea2020,list("Unassigned -- Jail",0,0))

ViolentCrimePerCapitaPerArea2019 <- Violent_Crime_in_2019 %>% 
  group_by(Community) %>%
  summarize(ViolentCrimeFrequency19=n())

ViolentCrimePerCapitaPerArea2019 <-  mutate(ViolentCrimePerCapitaPerArea2019,ViolentCrimePer1000inhabitants19=((ViolentCrimePerCapitaPerArea2019$ViolentCrimeFrequency19/population_data$tpop20)*1000))

ViolentCrimePerCapitaPerArea2019 <- rbind(ViolentCrimePerCapitaPerArea2019,list("Unassigned -- Jail",0,0))

ViolentCrimePerCapitaPerArea2018 <- Violent_Crime_in_2018 %>% 
  group_by(Community) %>%
  summarize(ViolentCrimeFrequency18=n())

ViolentCrimePerCapitaPerArea2018 <-  mutate(ViolentCrimePerCapitaPerArea2018,ViolentCrimePer1000inhabitants18=((ViolentCrimePerCapitaPerArea2018$ViolentCrimeFrequency18/population_data$tpop20)*1000))

ViolentCrimePerCapitaPerArea2018 <- rbind(ViolentCrimePerCapitaPerArea2018,list("Unassigned -- Jail",0,0))

ViolentCrimePerCapitaPerArea2017 <- Violent_Crime_in_2017 %>% 
  group_by(Community) %>%
  summarize(ViolentCrimeFrequency17=n())

ViolentCrimePerCapitaPerArea2017 <-  mutate(ViolentCrimePerCapitaPerArea2017,ViolentCrimePer1000inhabitants17=((ViolentCrimePerCapitaPerArea2017$ViolentCrimeFrequency17/population_data$tpop20)*1000))

ViolentCrimePerCapitaPerArea2017 <- rbind(ViolentCrimePerCapitaPerArea2017,list("Unassigned -- Jail",0,0))

ViolentCrimePerCapitaPerArea2016 <- Violent_Crime_in_2016 %>% 
  group_by(Community) %>%
  summarize(ViolentCrimeFrequency16=n())

ViolentCrimePerCapitaPerArea2016 <-  mutate(ViolentCrimePerCapitaPerArea2016,ViolentCrimePer1000inhabitants16=((ViolentCrimePerCapitaPerArea2016$ViolentCrimeFrequency16/population_data$tpop20)*1000))

ViolentCrimePerCapitaPerArea2016 <- rbind(ViolentCrimePerCapitaPerArea2016,list("Unassigned -- Jail",0,0))

ViolentCrimePerCapitaPerArea2015 <- Violent_Crime_in_2015 %>% 
  group_by(Community) %>%
  summarize(ViolentCrimeFrequency15=n())

ViolentCrimePerCapitaPerArea2015 <-  mutate(ViolentCrimePerCapitaPerArea2015,ViolentCrimePer1000inhabitants15=((ViolentCrimePerCapitaPerArea2015$ViolentCrimeFrequency15/population_data$tpop20)*1000))

ViolentCrimePerCapitaPerArea2015 <- rbind(ViolentCrimePerCapitaPerArea2015,list("Unassigned -- Jail",0,0))

ViolentCrimePerCapitaPerArea2014 <- Violent_Crime_in_2014 %>% 
  group_by(Community) %>%
  summarize(ViolentCrimeFrequency14=n())

ViolentCrimePerCapitaPerArea2014 <-  mutate(ViolentCrimePerCapitaPerArea2014,ViolentCrimePer1000inhabitants14=((ViolentCrimePerCapitaPerArea2014$ViolentCrimeFrequency14/population_data$tpop20)*1000))

ViolentCrimePerCapitaPerArea2014 <- rbind(ViolentCrimePerCapitaPerArea2014,list("Unassigned -- Jail",0,0))

Violent_crime_evolution <- ViolentCrimePerCapitaPerArea2021 %>% 
  left_join(ViolentCrimePerCapitaPerArea2020,by="Community") %>% 
  left_join(ViolentCrimePerCapitaPerArea2019,by="Community") %>%
  left_join(ViolentCrimePerCapitaPerArea2018,by="Community") %>%
  left_join(ViolentCrimePerCapitaPerArea2017,by="Community") %>% 
  left_join(ViolentCrimePerCapitaPerArea2016,by="Community") %>% 
  left_join(ViolentCrimePerCapitaPerArea2015,by="Community") %>% 
  left_join(ViolentCrimePerCapitaPerArea2014,by="Community")

Community_data <- Community_data %>% 
  left_join(Violent_crime_evolution,by="Community")

Violent_Crime_Yearly_evolution_map <- crime_data_with_areas %>%
  filter(VIO_PROP_CFS=="VIOLENT") %>% 
  count(year=floor_date(CrimeDateTime,"year")) %>% 
  ggplot(aes(year,n))+geom_line()+
  scale_x_date(limits = c(as.Date("2014-01-01"), as.Date("2020-12-31"))) +
  labs(title = str_wrap(("Overall, violent crime seems to have increased for the 2017 to 2020 period"),width=65),x="Year",y="Violent crime occurences")

Violent_Crime_Yearly_evolution_map


Property_Crime_in_2021 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2021-01-01") & CrimeDateTime <= as.Date("2021-12-31")) %>% filter(VIO_PROP_CFS=="PROPERTY")

Property_Crime_in_2020 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2020-01-01") & CrimeDateTime <= as.Date("2020-12-31")) %>% filter(VIO_PROP_CFS=="PROPERTY")

Property_Crime_in_2019 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2019-01-01") & CrimeDateTime <= as.Date("2019-12-31")) %>% filter(VIO_PROP_CFS=="PROPERTY")

Property_Crime_in_2018 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2018-01-01") & CrimeDateTime <= as.Date("2018-12-31")) %>% filter(VIO_PROP_CFS=="PROPERTY")

Property_Crime_in_2017 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2017-01-01") & CrimeDateTime <= as.Date("2017-12-31")) %>% filter(VIO_PROP_CFS=="PROPERTY")

Property_Crime_in_2016 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2016-01-01") & CrimeDateTime <= as.Date("2016-12-31")) %>% filter(VIO_PROP_CFS=="PROPERTY")

Property_Crime_in_2015 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2015-01-01") & CrimeDateTime <= as.Date("2015-12-31")) %>% filter(VIO_PROP_CFS=="PROPERTY")

Property_Crime_in_2014 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2014-01-01") & CrimeDateTime <= as.Date("2014-12-31")) %>% filter(VIO_PROP_CFS=="PROPERTY")

PropertyCrimePerCapitaPerArea2021 <- Property_Crime_in_2021 %>% 
  group_by(Community) %>%
  summarize(PropertyCrimeFrequency21=n())

PropertyCrimePerCapitaPerArea2021 <-  mutate(PropertyCrimePerCapitaPerArea2021,PropertyCrimePer1000inhabitants21=((PropertyCrimePerCapitaPerArea2021$PropertyCrimeFrequency21/population_data$tpop20)*1000))

PropertyCrimePerCapitaPerArea2021 <- rbind(PropertyCrimePerCapitaPerArea2021,list("Unassigned -- Jail",0,0))

PropertyCrimePerCapitaPerArea2020 <- Property_Crime_in_2020 %>% 
  group_by(Community) %>%
  summarize(PropertyCrimeFrequency20=n())

PropertyCrimePerCapitaPerArea2020 <-  mutate(PropertyCrimePerCapitaPerArea2020,PropertyCrimePer1000inhabitants20=((PropertyCrimePerCapitaPerArea2020$PropertyCrimeFrequency20/population_data$tpop20)*1000))

PropertyCrimePerCapitaPerArea2020 <- rbind(PropertyCrimePerCapitaPerArea2020,list("Unassigned -- Jail",0,0))

PropertyCrimePerCapitaPerArea2019 <- Property_Crime_in_2019 %>% 
  group_by(Community) %>%
  summarize(PropertyCrimeFrequency19=n())

PropertyCrimePerCapitaPerArea2019 <-  mutate(PropertyCrimePerCapitaPerArea2019,PropertyCrimePer1000inhabitants19=((PropertyCrimePerCapitaPerArea2019$PropertyCrimeFrequency19/population_data$tpop20)*1000))

PropertyCrimePerCapitaPerArea2019 <- rbind(PropertyCrimePerCapitaPerArea2019,list("Unassigned -- Jail",0,0))

PropertyCrimePerCapitaPerArea2018 <- Property_Crime_in_2018 %>% 
  group_by(Community) %>%
  summarize(PropertyCrimeFrequency18=n())

PropertyCrimePerCapitaPerArea2018 <-  mutate(PropertyCrimePerCapitaPerArea2018,PropertyCrimePer1000inhabitants18=((PropertyCrimePerCapitaPerArea2018$PropertyCrimeFrequency18/population_data$tpop20)*1000))

PropertyCrimePerCapitaPerArea2018 <- rbind(PropertyCrimePerCapitaPerArea2018,list("Unassigned -- Jail",0,0))

PropertyCrimePerCapitaPerArea2017 <- Property_Crime_in_2017 %>% 
  group_by(Community) %>%
  summarize(PropertyCrimeFrequency17=n())

PropertyCrimePerCapitaPerArea2017 <-  mutate(PropertyCrimePerCapitaPerArea2017,PropertyCrimePer1000inhabitants17=((PropertyCrimePerCapitaPerArea2017$PropertyCrimeFrequency17/population_data$tpop20)*1000))

PropertyCrimePerCapitaPerArea2017 <- rbind(PropertyCrimePerCapitaPerArea2017,list("Unassigned -- Jail",0,0))

PropertyCrimePerCapitaPerArea2016 <- Property_Crime_in_2016 %>% 
  group_by(Community) %>%
  summarize(PropertyCrimeFrequency16=n())

PropertyCrimePerCapitaPerArea2016 <-  mutate(PropertyCrimePerCapitaPerArea2016,PropertyCrimePer1000inhabitants16=((PropertyCrimePerCapitaPerArea2016$PropertyCrimeFrequency16/population_data$tpop20)*1000))

PropertyCrimePerCapitaPerArea2016 <- rbind(PropertyCrimePerCapitaPerArea2016,list("Unassigned -- Jail",0,0))

PropertyCrimePerCapitaPerArea2015 <- Property_Crime_in_2015 %>% 
  group_by(Community) %>%
  summarize(PropertyCrimeFrequency15=n())

PropertyCrimePerCapitaPerArea2015 <-  mutate(PropertyCrimePerCapitaPerArea2015,PropertyCrimePer1000inhabitants15=((PropertyCrimePerCapitaPerArea2015$PropertyCrimeFrequency15/population_data$tpop20)*1000))

PropertyCrimePerCapitaPerArea2015 <- rbind(PropertyCrimePerCapitaPerArea2015,list("Unassigned -- Jail",0,0))

PropertyCrimePerCapitaPerArea2014 <- Property_Crime_in_2014 %>% 
  group_by(Community) %>%
  summarize(PropertyCrimeFrequency14=n())

PropertyCrimePerCapitaPerArea2014 <-  mutate(PropertyCrimePerCapitaPerArea2014,PropertyCrimePer1000inhabitants14=((PropertyCrimePerCapitaPerArea2014$PropertyCrimeFrequency14/population_data$tpop20)*1000))

PropertyCrimePerCapitaPerArea2014 <- rbind(PropertyCrimePerCapitaPerArea2014,list("Unassigned -- Jail",0,0))

Property_crime_evolution <- PropertyCrimePerCapitaPerArea2021 %>% 
  left_join(PropertyCrimePerCapitaPerArea2020,by="Community") %>% 
  left_join(PropertyCrimePerCapitaPerArea2019,by="Community") %>%
  left_join(PropertyCrimePerCapitaPerArea2018,by="Community") %>%
  left_join(PropertyCrimePerCapitaPerArea2017,by="Community") %>% 
  left_join(PropertyCrimePerCapitaPerArea2016,by="Community") %>% 
  left_join(PropertyCrimePerCapitaPerArea2015,by="Community") %>% 
  left_join(PropertyCrimePerCapitaPerArea2014,by="Community")

Community_data <- Community_data %>% 
  left_join(Property_crime_evolution,by="Community")

Property_Crime_Yearly_evolution_map <- crime_data_with_areas %>%
  filter(VIO_PROP_CFS=="PROPERTY") %>% 
  count(year=floor_date(CrimeDateTime,"year")) %>% 
  ggplot(aes(year,n))+geom_line()+
  scale_x_date(limits = c(as.Date("2014-01-01"), as.Date("2020-12-31"))) +
  labs(title = str_wrap(("Overall, property crime seems to have decreased for the 2017 to 2020 period"),width=65),x="Year",y="Property crime occurences")

Property_Crime_Yearly_evolution_map

3.4.2 Calculation of felony and misdemeanor evolution

We can make the exact same computation to calculate felony and misdemeanor evolution.

Felony_in_2021 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2021-01-01") & CrimeDateTime <= as.Date("2021-12-31")) %>% filter(Category=="Felony")

Felony_in_2020 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2020-01-01") & CrimeDateTime <= as.Date("2020-12-31")) %>% filter(Category=="Felony")

Felony_in_2019 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2019-01-01") & CrimeDateTime <= as.Date("2019-12-31")) %>% filter(Category=="Felony")

Felony_in_2018 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2018-01-01") & CrimeDateTime <= as.Date("2018-12-31")) %>% filter(Category=="Felony")

Felony_in_2017 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2017-01-01") & CrimeDateTime <= as.Date("2017-12-31")) %>% filter(Category=="Felony")

Felony_in_2016 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2016-01-01") & CrimeDateTime <= as.Date("2016-12-31")) %>% filter(Category=="Felony")

Felony_in_2015 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2015-01-01") & CrimeDateTime <= as.Date("2015-12-31")) %>% filter(Category=="Felony")

Felony_in_2014 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2014-01-01") & CrimeDateTime <= as.Date("2014-12-31")) %>% filter(Category=="Felony")

FelonyPerCapitaPerArea2021 <- Felony_in_2021 %>% 
  group_by(Community) %>%
  summarize(FelonyFrequency21=n())

FelonyPerCapitaPerArea2021 <-  mutate(FelonyPerCapitaPerArea2021,FelonyPer1000inhabitants21=((FelonyPerCapitaPerArea2021$FelonyFrequency21/population_data$tpop20)*1000))

FelonyPerCapitaPerArea2021 <- rbind(FelonyPerCapitaPerArea2021,list("Unassigned -- Jail",0,0))

FelonyPerCapitaPerArea2020 <- Felony_in_2020 %>% 
  group_by(Community) %>%
  summarize(FelonyFrequency20=n())

FelonyPerCapitaPerArea2020 <-  mutate(FelonyPerCapitaPerArea2020,FelonyPer1000inhabitants20=((FelonyPerCapitaPerArea2020$FelonyFrequency20/population_data$tpop20)*1000))

FelonyPerCapitaPerArea2020 <- rbind(FelonyPerCapitaPerArea2020,list("Unassigned -- Jail",0,0))

FelonyPerCapitaPerArea2019 <- Felony_in_2019 %>% 
  group_by(Community) %>%
  summarize(FelonyFrequency19=n())

FelonyPerCapitaPerArea2019 <-  mutate(FelonyPerCapitaPerArea2019,FelonyPer1000inhabitants19=((FelonyPerCapitaPerArea2019$FelonyFrequency19/population_data$tpop20)*1000))

FelonyPerCapitaPerArea2019 <- rbind(FelonyPerCapitaPerArea2019,list("Unassigned -- Jail",0,0))

FelonyPerCapitaPerArea2018 <- Felony_in_2018 %>% 
  group_by(Community) %>%
  summarize(FelonyFrequency18=n())

FelonyPerCapitaPerArea2018 <-  mutate(FelonyPerCapitaPerArea2018,FelonyPer1000inhabitants18=((FelonyPerCapitaPerArea2018$FelonyFrequency18/population_data$tpop20)*1000))

FelonyPerCapitaPerArea2018 <- rbind(FelonyPerCapitaPerArea2018,list("Unassigned -- Jail",0,0))

FelonyPerCapitaPerArea2017 <- Felony_in_2017 %>% 
  group_by(Community) %>%
  summarize(FelonyFrequency17=n())

FelonyPerCapitaPerArea2017 <-  mutate(FelonyPerCapitaPerArea2017,FelonyPer1000inhabitants17=((FelonyPerCapitaPerArea2017$FelonyFrequency17/population_data$tpop20)*1000))

FelonyPerCapitaPerArea2017 <- rbind(FelonyPerCapitaPerArea2017,list("Unassigned -- Jail",0,0))

FelonyPerCapitaPerArea2016 <- Felony_in_2016 %>% 
  group_by(Community) %>%
  summarize(FelonyFrequency16=n())

FelonyPerCapitaPerArea2016 <-  mutate(FelonyPerCapitaPerArea2016,FelonyPer1000inhabitants16=((FelonyPerCapitaPerArea2016$FelonyFrequency16/population_data$tpop20)*1000))

FelonyPerCapitaPerArea2016 <- rbind(FelonyPerCapitaPerArea2016,list("Unassigned -- Jail",0,0))

FelonyPerCapitaPerArea2015 <- Felony_in_2015 %>% 
  group_by(Community) %>%
  summarize(FelonyFrequency15=n())

FelonyPerCapitaPerArea2015 <-  mutate(FelonyPerCapitaPerArea2015,FelonyPer1000inhabitants15=((FelonyPerCapitaPerArea2015$FelonyFrequency15/population_data$tpop20)*1000))

FelonyPerCapitaPerArea2015 <- rbind(FelonyPerCapitaPerArea2015,list("Unassigned -- Jail",0,0))

FelonyPerCapitaPerArea2014 <- Felony_in_2014 %>% 
  group_by(Community) %>%
  summarize(FelonyFrequency14=n())

FelonyPerCapitaPerArea2014 <-  mutate(FelonyPerCapitaPerArea2014,FelonyPer1000inhabitants14=((FelonyPerCapitaPerArea2014$FelonyFrequency14/population_data$tpop20)*1000))

FelonyPerCapitaPerArea2014 <- rbind(FelonyPerCapitaPerArea2014,list("Unassigned -- Jail",0,0))

Felony_evolution <- FelonyPerCapitaPerArea2021 %>% 
  left_join(FelonyPerCapitaPerArea2020,by="Community") %>% 
  left_join(FelonyPerCapitaPerArea2019,by="Community") %>%
  left_join(FelonyPerCapitaPerArea2018,by="Community") %>%
  left_join(FelonyPerCapitaPerArea2017,by="Community") %>% 
  left_join(FelonyPerCapitaPerArea2016,by="Community") %>% 
  left_join(FelonyPerCapitaPerArea2015,by="Community") %>% 
  left_join(FelonyPerCapitaPerArea2014,by="Community")

Community_data <- Community_data %>% 
  left_join(Felony_evolution,by="Community")

Felony_Yearly_evolution_map <- crime_data_with_areas %>%
  filter(Category=="Felony") %>% 
  count(year=floor_date(CrimeDateTime,"year")) %>% 
  ggplot(aes(year,n))+geom_line()+
  scale_x_date(limits = c(as.Date("2014-01-01"), as.Date("2020-12-31"))) +
  labs(title = "In Baltimore, Felony started to decrease as from 2017",x="Year",y="Felony occurences")

Felony_Yearly_evolution_map


Misdemeanor_in_2021 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2021-01-01") & CrimeDateTime <= as.Date("2021-12-31")) %>% filter(Category=="Misdemeanor")

Misdemeanor_in_2020 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2020-01-01") & CrimeDateTime <= as.Date("2020-12-31")) %>% filter(Category=="Misdemeanor")

Misdemeanor_in_2019 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2019-01-01") & CrimeDateTime <= as.Date("2019-12-31")) %>% filter(Category=="Misdemeanor")

Misdemeanor_in_2018 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2018-01-01") & CrimeDateTime <= as.Date("2018-12-31")) %>% filter(Category=="Misdemeanor")

Misdemeanor_in_2017 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2017-01-01") & CrimeDateTime <= as.Date("2017-12-31")) %>% filter(Category=="Misdemeanor")

Misdemeanor_in_2016 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2016-01-01") & CrimeDateTime <= as.Date("2016-12-31")) %>% filter(Category=="Misdemeanor")

Misdemeanor_in_2015 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2015-01-01") & CrimeDateTime <= as.Date("2015-12-31")) %>% filter(Category=="Misdemeanor")

Misdemeanor_in_2014 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2014-01-01") & CrimeDateTime <= as.Date("2014-12-31")) %>% filter(Category=="Misdemeanor")

MisdemeanorPerCapitaPerArea2021 <- Misdemeanor_in_2021 %>% 
  group_by(Community) %>%
  summarize(MisdemeanorFrequency21=n())

MisdemeanorPerCapitaPerArea2021 <-  mutate(MisdemeanorPerCapitaPerArea2021,MisdemeanorPer1000inhabitants21=((MisdemeanorPerCapitaPerArea2021$MisdemeanorFrequency21/population_data$tpop20)*1000))

MisdemeanorPerCapitaPerArea2021 <- rbind(MisdemeanorPerCapitaPerArea2021,list("Unassigned -- Jail",0,0))

MisdemeanorPerCapitaPerArea2020 <- Misdemeanor_in_2020 %>% 
  group_by(Community) %>%
  summarize(MisdemeanorFrequency20=n())

MisdemeanorPerCapitaPerArea2020 <-  mutate(MisdemeanorPerCapitaPerArea2020,MisdemeanorPer1000inhabitants20=((MisdemeanorPerCapitaPerArea2020$MisdemeanorFrequency20/population_data$tpop20)*1000))

MisdemeanorPerCapitaPerArea2020 <- rbind(MisdemeanorPerCapitaPerArea2020,list("Unassigned -- Jail",0,0))

MisdemeanorPerCapitaPerArea2019 <- Misdemeanor_in_2019 %>% 
  group_by(Community) %>%
  summarize(MisdemeanorFrequency19=n())

MisdemeanorPerCapitaPerArea2019 <-  mutate(MisdemeanorPerCapitaPerArea2019,MisdemeanorPer1000inhabitants19=((MisdemeanorPerCapitaPerArea2019$MisdemeanorFrequency19/population_data$tpop20)*1000))

MisdemeanorPerCapitaPerArea2019 <- rbind(MisdemeanorPerCapitaPerArea2019,list("Unassigned -- Jail",0,0))

MisdemeanorPerCapitaPerArea2018 <- Misdemeanor_in_2018 %>% 
  group_by(Community) %>%
  summarize(MisdemeanorFrequency18=n())

MisdemeanorPerCapitaPerArea2018 <-  mutate(MisdemeanorPerCapitaPerArea2018,MisdemeanorPer1000inhabitants18=((MisdemeanorPerCapitaPerArea2018$MisdemeanorFrequency18/population_data$tpop20)*1000))

MisdemeanorPerCapitaPerArea2018 <- rbind(MisdemeanorPerCapitaPerArea2018,list("Unassigned -- Jail",0,0))

MisdemeanorPerCapitaPerArea2017 <- Misdemeanor_in_2017 %>% 
  group_by(Community) %>%
  summarize(MisdemeanorFrequency17=n())

MisdemeanorPerCapitaPerArea2017 <-  mutate(MisdemeanorPerCapitaPerArea2017,MisdemeanorPer1000inhabitants17=((MisdemeanorPerCapitaPerArea2017$MisdemeanorFrequency17/population_data$tpop20)*1000))

MisdemeanorPerCapitaPerArea2017 <- rbind(MisdemeanorPerCapitaPerArea2017,list("Unassigned -- Jail",0,0))

MisdemeanorPerCapitaPerArea2016 <- Misdemeanor_in_2016 %>% 
  group_by(Community) %>%
  summarize(MisdemeanorFrequency16=n())

MisdemeanorPerCapitaPerArea2016 <-  mutate(MisdemeanorPerCapitaPerArea2016,MisdemeanorPer1000inhabitants16=((MisdemeanorPerCapitaPerArea2016$MisdemeanorFrequency16/population_data$tpop20)*1000))

MisdemeanorPerCapitaPerArea2016 <- rbind(MisdemeanorPerCapitaPerArea2016,list("Unassigned -- Jail",0,0))

MisdemeanorPerCapitaPerArea2015 <- Misdemeanor_in_2015 %>% 
  group_by(Community) %>%
  summarize(MisdemeanorFrequency15=n())

MisdemeanorPerCapitaPerArea2015 <-  mutate(MisdemeanorPerCapitaPerArea2015,MisdemeanorPer1000inhabitants15=((MisdemeanorPerCapitaPerArea2015$MisdemeanorFrequency15/population_data$tpop20)*1000))

MisdemeanorPerCapitaPerArea2015 <- rbind(MisdemeanorPerCapitaPerArea2015,list("Unassigned -- Jail",0,0))

MisdemeanorPerCapitaPerArea2014 <- Misdemeanor_in_2014 %>% 
  group_by(Community) %>%
  summarize(MisdemeanorFrequency14=n())

MisdemeanorPerCapitaPerArea2014 <-  mutate(MisdemeanorPerCapitaPerArea2014,MisdemeanorPer1000inhabitants14=((MisdemeanorPerCapitaPerArea2014$MisdemeanorFrequency14/population_data$tpop20)*1000))

MisdemeanorPerCapitaPerArea2014 <- rbind(MisdemeanorPerCapitaPerArea2014,list("Unassigned -- Jail",0,0))

Misdemeanor_evolution <- MisdemeanorPerCapitaPerArea2021 %>% 
  left_join(MisdemeanorPerCapitaPerArea2020,by="Community") %>% 
  left_join(MisdemeanorPerCapitaPerArea2019,by="Community") %>%
  left_join(MisdemeanorPerCapitaPerArea2018,by="Community") %>%
  left_join(MisdemeanorPerCapitaPerArea2017,by="Community") %>% 
  left_join(MisdemeanorPerCapitaPerArea2016,by="Community") %>% 
  left_join(MisdemeanorPerCapitaPerArea2015,by="Community") %>% 
  left_join(MisdemeanorPerCapitaPerArea2014,by="Community")

Community_data <- Community_data %>% 
  left_join(Misdemeanor_evolution,by="Community")

Misdemeanor_Yearly_evolution_map <- crime_data_with_areas %>%
  filter(Category=="Misdemeanor") %>% 
  count(year=floor_date(CrimeDateTime,"year")) %>% 
  ggplot(aes(year,n))+geom_line()+
  scale_x_date(limits = c(as.Date("2014-01-01"), as.Date("2020-12-31"))) +
  labs(title = "In Baltimore, Misdemeanor started to decrease as from 2017",x="Year",y="Misdemeanor occurences")

Misdemeanor_Yearly_evolution_map


Over the 2014-2019 one can observe a rather worrying tendency as more violent and severe crimes increased (in absolute terms) while less severe and property crime decreased.
### 3.5 Comparison of crimes and wealth

We want to investigate whether or not wealthier areas are more impacted by crime. To do so, we once more compute a simple linear regression. We see that the \(R^2\) is quite poor indicating a rather poor correlation, Yet, it must be said that looking at the scatter plot, one can still notice a tendency for poorer areas to be more strongly impacted by crime. One noticeable outlier is Downtown/Seton Hill. This is the area right in the city center of Baltimore. This area is special as crime there is extremly high but this also is one of the less poor area of Baltimore. This can typically be explained by the fact that city centers are typically not the poorest areas of a city.

Crime_VS_Poverty <- CrimeStatsPerArea %>% 
  left_join(poverty_data,by=c("Community"="CSA2010"))

regression3 <- lm(Crime_VS_Poverty$CrimePer1000inhabitants~Crime_VS_Poverty$hhpov19)
Poverty VS Crime
Dependent variable:
Crime (per 1000 inhabitants)
Poverty level 14.200***
(3.430)
intercept 392.000***
(68.200)
Observations 56
R2 0.242
Adjusted R2 0.228
Residual Std. Error 288.000 (df = 54)
F Statistic 17.300*** (df = 1; 54)
Note: p<0.1; p<0.05; p<0.01


Downtown_label_2 <- Crime_VS_Poverty[14,]

ggplot(data=Crime_VS_Poverty,mapping= aes(x=hhpov19,y=CrimePer1000inhabitants)) + 
  labs(title = "Poorer areas tend to be more strongly impacted by crime", x="Poverty rate",y="Crime per Capita")+
  geom_point(data=Downtown_label_2) +
  ggrepel::geom_label_repel(aes(label=Community),data = Downtown_label_2,min.segment.length = 0.5)+
  geom_point() + 
  geom_smooth(method = lm,color="blue",size=0.3)

Analysis

4.1 CCTVs VS Crime - Does the presence of CCTV deter crime?

In order to be in position to comment on the effectiveness of CCTVs on crime deterrence, one is first going to investigate the relationship between CCTV density and crime per capita. Based on our EDA, we have already seen that both crime and CCTV seemed to be highly concentrated in the city center which might have indicated a correlation. To verify this observation, we create a simple linear regression model. We first create a new data frame called CCTV_VS_crimes (which basically is a left joint). The results of linear regression indicates a moderate correlation between higher CCTV density and higher crime per capita. The \(R^2\) is at 42.9%. Plotting the observations enables one to see this tendency. The blue line represents the regression line.

CCTV_VS_crimes <- CCTV_per_area %>% 
  left_join(CrimeStatsPerArea,by="Community")
  
regression <- lm(CCTV_VS_crimes$CrimePer1000inhabitants~CCTV_VS_crimes$density_perc)
Crime VS CCTVs
Dependent variable:
Crime (per 1000 inhabitants)
CCTV Density 91.200***
(14.300)
intercept 463.000***
(42.000)
Observations 56
R2 0.429
Adjusted R2 0.419
Residual Std. Error 250.000 (df = 54)
F Statistic 40.600*** (df = 1; 54)
Note: p<0.1; p<0.05; p<0.01
ggplot(data=CCTV_VS_crimes,mapping= aes(x=density_perc,y=CrimePer1000inhabitants)) + 
  labs(title = "As CCTV density increases, crime tends to increase as well.", x="CCTV density",y="Crime (per 1000 inhabitants)",)+
  geom_point() + 
  geom_smooth(method = lm,color="blue",size=0.3)

4.1.1 Mapping of CCTVs and crime per capita

Another interesting way to illustrate this correlation is by using a map depicting crime per capita in each area and adding an extra layer to represent CCTVs. The method is the same as before with the tmap package. However, this time we have two different shapes: tm_shape(baltimore) which constitutes the base map and tm_shape(balt_dat) which adds a layer containing points. If we take a look at this map we see that it confirms the intuition about the phenomenon we illustrated before. It seems as if where crime per capita is the lowest, there seems to be less CCTVs. There also seems to be a correlation between the dark red areas and the highest CCTV concentrations. Looking at the north-western and south-western areas of the map, it can be seen that the placement of CCTVs aligns rather well with the areas considered dangerous. We can also use an interactive map. The benefit of using such map is that it enables to get precise information about crime per capita in a given area by clicking on it.

Crime_per_capita_VS_CCTV_map <- tm_shape(baltimore) + tm_fill(col = "CrimePer1000inhabitants", title ="Crime (per 1000 inhabitants)",style = "quantile") + tm_borders(col="black",alpha=0.3) + tm_layout(inner.margins = 0.1, title="Crime seems to be higher in areas with more CCTVs") + tm_shape(balt_dat) + tm_dots(col="black")

tmap_mode("plot")

Crime_per_capita_VS_CCTV_map

baltimore@data[["fid"]]<-baltimore@data[["community"]] #We do that so that we see the name of the Community when using an interactive map

Crime_and_CCTV_map <- tm_shape(baltimore) + tm_fill(col = "CrimePer1000inhabitants", title ="Crime (per 1000 inhabitants)",style = "quantile") + tm_borders(col="black",alpha=0.3) + tm_layout(inner.margins = 0.05)+ tm_shape(balt_dat) + tm_dots(col="black")

This map allows one to see CCTV location precisely and to obtain exact crime per capita figures.


However, even though this correlation is interesting and worth mentioning, it does not allow one to say much about the effectiveness of CCTVs. We are stuck in a sort of chicken-and-egg problem. In order to get in position to comment on the crime-deterring potential of CCTVs, we decided to analyse the crime per capita evolution of certain areas in relation with their respective CCTV density. Our goal is to observe how crime per capita behaved since 2014 in areas with high CCTV density compared to areas with low CCTV density to see whether a clear tendency is noticeable. Indeed, if CCTVs were truly effective, crime should decrease more (or at least increase less) in areas with high CCTV density compared to areas with low CCTV density. In order to select our areas of interest, we performed a k-means clustering based on two dimensions: CCTV density and crime per capita. We first scale the data and add the names of each area as row name. We then use the elbow method to determine the optimal number of cluster using fviz_nbclust. When performing the kmeans clustering, we specify a nstart parameter. Indeed, if we fail to do so, there is only one choice of random set of rows chosen in the data set as initial centers. We end up with 5 clusters that we represent using fviz_cluster. The ratio between the between sum of square and the within sum of square is good (85%), the higher this value the better. Indeed, we want the variation to come from between groups and not within groups.

library(FactoMineR)
library(factoextra)

sc.Community_data_Clustering <- scale(Community_data[,c(2:89)]) #We scale the data

row.names(sc.Community_data_Clustering) <- as.vector(t(Community_data[,1])) #We add names of area for each row

fviz_nbclust(sc.Community_data_Clustering[,c(1,3)], kmeans, method="wss")+
  geom_vline(xintercept = 5, linetype = 2) + #add line for better visualisation
  labs(subtitle = "Elbow method")  #We can determine the optimal number of cluster, 5 clusters seems to be reasonable



set.seed(10)
km.clust2 <- kmeans(sc.Community_data_Clustering[,c(1,3)], 5, nstart = 25)
fviz_cluster(km.clust2, data=sc.Community_data_Clustering[,c(1,3)], repel=TRUE)+labs(x="Crime (per 1000 inhabitants)",y="CCTV density",title = str_wrap(("With 5 clusters, we have a rather good ratio of 85% between between and within SS"),width=65),subtitle=str_wrap(("We will focus on areas in the high crime, high CCTV density cluster as well as those in the high crime, low CCTV density cluster"),width=80))


We start analysing how crime per capita evolved in areas belonging to the “high crime, high CCTV density” cluster. On top of graphically representing this evolution, one also computes the percentage change in crime per capita betwen 2014 and 2019. We want to see whether we can observe a decreasing tendency in these areas with CCTV density. We create maps using ggplot and geom_line. Then, in order to show these maps in a more concise way, we use grid.arrange. The table below shows the percentage change in crime per capita between the 2014 and the 2019 period. For the sake of our analysis we will only consider the 2014-2019 period. Indeed, it seems reasonable to assume that the decrease one observes in most areas for the 2020 period can be attributed to the Covid-19 pandemic.

Downtown_Seton_Hill_evolution <- as.vector(t(Community_data[14,c(25,23,21,19,17,15,13)]))

Year <- c(2014:2020)

Downtown_Seton_Hill <- data.frame(Downtown_Seton_Hill_evolution,Year)

Downtown_Seton_Hill_map <- ggplot(Downtown_Seton_Hill,aes(x=Year,y=Downtown_Seton_Hill_evolution)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 250), breaks =seq(0,250,50)) + 
  labs(title = "Downtown/Seton Hill",x="Year",y="Crime")

Oldtown_Middle_East_evolution <- as.vector(t(Community_data[41,c(25,23,21,19,17,15,13)]))

Oldtown_Middle_East <- data.frame(Oldtown_Middle_East_evolution,Year)

Oldtown_Middle_East_map <- ggplot(Oldtown_Middle_East,aes(x=Year,y=Oldtown_Middle_East_evolution)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 250), breaks =seq(0,250,50)) + 
  labs(title = "Oldtown/Middle East",x="Year",y="Crime")

Sandtown_Winchester_Harlem_Park_evolution <- as.vector(t(Community_data[47,c(25,23,21,19,17,15,13)]))

Sandtown_Winchester_Harlem_Park <- data.frame(Sandtown_Winchester_Harlem_Park_evolution,Year)

Sandtown_Winchester_Harlem_Park_map <- ggplot(Sandtown_Winchester_Harlem_Park,aes(x=Year,y=Sandtown_Winchester_Harlem_Park_evolution)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 250), breaks =seq(0,250,50)) +
  labs(title = "Sandtown-Winchester/Harlem Park",x="Year",y="Crime")

Cherry_Hill_evolution <- as.vector(t(Community_data[7,c(25,23,21,19,17,15,13)]))

Cherry_Hill <- data.frame(Cherry_Hill_evolution,Year)

Cherry_Hill_map <- ggplot(Cherry_Hill,aes(x=Year,y=Cherry_Hill_evolution)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 250), breaks =seq(0,250,50)) +
  labs(title = "Cherry Hill",x="Year",y="Crime")

library(pdp)

grid.arrange(Downtown_Seton_Hill_map,Oldtown_Middle_East_map,Sandtown_Winchester_Harlem_Park_map,Cherry_Hill_map,nrow=2,ncol=2,top="Four areas with HIGH crime and HIGH CCTV density")

Crime_Evolution_VS_CCTV <- Community_data[,c(1,25,15,4)] %>% 
  mutate(change_perc=((Community_data$CrimePer1000inhabitants19/Community_data$CrimePer1000inhabitants14)-1)*100)

knitr::kable(Crime_Evolution_VS_CCTV[c(7,14,41,47),c(1,5)],col.names = c('Community','Crime per Capita % change')) %>%
  kable_styling(position = "center")
Community Crime per Capita % change
Cherry Hill -0.157
Downtown/Seton Hill 39.021
Oldtown/Middle East 25.209
Sandtown-Winchester/Harlem Park -8.097

One can see that for two out of these four areas with high crime and high CCTV density, crime per capita has actually increased. It slightly decreased for Cherry Hill and decreased more significantly for Sandtown-Winchester/Harlem Park. In order to be able to make a comment on CCTV effectiveness, we decided to also analyse how crime per capita evolved in areas with high crime and low CCTV density.

Washington_Village_Pigtown_evolution <- as.vector(t(Community_data[54,c(25,23,21,19,17,15,13)]))

Washington_Village_Pigtown_ <- data.frame(Washington_Village_Pigtown_evolution,Year)

Washington_Village_Pigtown_map <- ggplot(Washington_Village_Pigtown_,aes(x=Year,y=Washington_Village_Pigtown_evolution)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 250), breaks =seq(0,250,50)) +
  labs(title = " Washington Village/Pigtown",x="Year",y="Crime")

Harbor_East_Little_Italy_evolution <- as.vector(t(Community_data[26,c(25,23,21,19,17,15,13)]))

Harbor_East_Little_Italy <- data.frame(Harbor_East_Little_Italy_evolution,Year)

Harbor_East_Little_Italy_map <- ggplot(Harbor_East_Little_Italy,aes(x=Year,y=Harbor_East_Little_Italy_evolution)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 250), breaks =seq(0,250,50)) +
  labs(title = "Harbor East/Little Italy",x="Year",y="Crime")

Madison_East_End_evolution <- as.vector(t(Community_data[33,c(25,23,21,19,17,15,13)]))

Madison_East_End <- data.frame(Madison_East_End_evolution,Year)

Madison_East_End_map <- ggplot(Madison_East_End,aes(x=Year,y=Madison_East_End_evolution)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 250), breaks =seq(0,250,50)) +
  labs(title ="Madison East End",x="Year",y="Crime")

Southwest_Baltimore_evolution <- as.vector(t(Community_data[51,c(25,23,21,19,17,15,13)]))

Southwest_Baltimore <- data.frame(Southwest_Baltimore_evolution,Year)

Southwest_Baltimore_map <- ggplot(Southwest_Baltimore,aes(x=Year,y=Southwest_Baltimore_evolution)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 250), breaks =seq(0,250,50)) +
  labs(title ="Southwest Baltimore",x="Year",y="Crime")

grid.arrange(Washington_Village_Pigtown_map,Harbor_East_Little_Italy_map,Madison_East_End_map,Southwest_Baltimore_map,nrow=2,ncol=2,top="Four areas with HIGH crime and LOW CCTV density")

knitr::kable(Crime_Evolution_VS_CCTV[c(26,33,51,54),c(1,5)],col.names = c('Community','Crime per Capita % change')) %>%
  kable_styling(position = "center")
Community Crime per Capita % change
Harbor East/Little Italy -27.3
Madison/East End 12.9
Southwest Baltimore 20.8
Washington Village/Pigtown 11.9

When comparing the four members of the high crime, high CCTV density cluster with the four members of the high crime, low CCTV density cluster, we observe that for three members of the low CCTV density cluster, crime per capita has increased while crime per capita has decreased for only two members of the high CCTV density cluster. While this could potentially indicate that crime decreased more often in areas with high CCTV density compared to areas with low CCTV density and therefore indicate that CCTV may be effective crime deterrent, it is absolutely crucial to temper this statement with statistically significance. Indeed, the decrease we observe in Cherry Hill and Sandtown-Winchester/Harlem Park could be the result of pure luck/coincidence or could be attributed to many other factors. In addition, the extremely small number of data items analysed makes it particularly difficult to draw conclusions. It should also be noted that although crime has fallen in two high crime, high CCTV areas, in one case it has fallen only slightly (Cherry Hill).

Looking at how crime per capita has evolved throughout Baltimore City, one observes a general downward trend since 2017. This further highlights the fact that the downward results one observes in areas belonging to the HIGH crime, HIGH CCTV density cluster should be interpreted with caution because other factors than CCTVs might be the reason for these decreases. Finally, we computed a simple linear regression model between CCTV density and crime per capita evolution. We found out no significant correlation, \(R^2\) is extremely poor (3.3%).

All this leads us to believe that it is difficult to attribute real effectiveness to CCTVs.

Crime_Yearly_evolution_map <- crime_data_with_areas %>% 
  count(year=floor_date(CrimeDateTime,"year")) %>% 
  ggplot(aes(year,n))+geom_line()+
  scale_x_date(limits = c(as.Date("2014-01-01"), as.Date("2020-12-31"))) +
  labs(title = "Overall, crime seems to have decreased for 2017 to 2020 period",subtitle=str_wrap(("This explains why we should be careful when considering CCTV effectiveness"),width=80),x="Year",y="Crime occurences")

Crime_Yearly_evolution_map

regression7 <- lm(Crime_Evolution_VS_CCTV$change_perc~Crime_Evolution_VS_CCTV$density_perc)


CCTV density VS Crime evolution
Dependent variable:
Crime per Capita % change
CCTV Density 1.510
(1.130)
intercept 0.528
(3.330)
Observations 55
R2 0.033
Adjusted R2 0.015
Residual Std. Error 19.600 (df = 53)
F Statistic 1.810 (df = 1; 53)
Note: p<0.1; p<0.05; p<0.01



4.1.2 Analysis of where crime took place: August 2021

However, still in an effort to observe how CCTVs impact crime, we set out to locate crimes committed and compare these locations to those of the surveillance cameras. Knowing that a surveillance camera captures activities within 256ft (~2 blocks), it is interesting to observe whether or not there are “crime-free zones” around the cameras. We will only select crime committed in August 2021 to have interpretable data (choosing a larger time frame would make the map unreadable). We choose August 2021 because it is the latest full month which we have in our data set. Taking the latest time point from the data assures us that most of the CCTVs presented in the data set were already there (since we have no information of when exactly these CCTVs were added). Again, as before, we create a data table, assign coordinates, define CRS (in this case the CRS is “EPDS4326”, which we needed to transform using spTransform). Again, we create a map with tm_shape to visualise the results. The output shows where crime takes place (in red) compared to the CCTV location (in black). By zooming on the map, we see that some crimes are committed directly in front of CCTVs. Although this is not conclusive evidence, this observation goes against the idea that CCTVs are effective crime deterrents.

crime_spatial <- as.data.table(crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2021-08-01") & CrimeDateTime <= as.Date("2021-08-31")))
coordinates(crime_spatial) <-  c("Longitude","Latitude")
proj4string(crime_spatial) <-  CRS("+init=epsg:4326")
crime_spatial <- spTransform(crime_spatial,crs.geo1)

August21Crimes_VS_CCTV <- tm_shape(baltimore) + tm_borders(col="black",alpha=0.3) + tm_layout(inner.margins = 0.1,frame.lwd = 5)+ tm_shape(balt_dat) + tm_dots(col="black")+tm_shape(crime_spatial)+tm_dots(col="red",alpha=0.5)

Crimes committed in August 2021 (in red) VS CCTVs (in black)


To further prove that point we decided to focus on one specific area. We arbitrarily decided to focus on the area which had the highest crime per capita in August. Thus, we needed to calculate the crime rate for August per area to see where crime per capita was highest. The results show that they are in Downtown.

So following this we take a closer look at the Downtown area.

CrimePerCapitaPerAreaAugust2021 <- crime_data_with_areas %>%  filter(CrimeDateTime >= as.Date("2021-08-01") & CrimeDateTime <= as.Date("2021-08-31")) %>% 
  group_by(Community) %>%
  summarize(CrimeFrequency=n())

CrimePerCapitaPerAreaAugust2021 <-mutate(CrimePerCapitaPerAreaAugust2021,CrimePer1000inhabitants=((CrimePerCapitaPerAreaAugust2021$CrimeFrequency/population_data$tpop20)*1000))
#We see that Downtown is the area with the highest crime rate in August 2021, we might want to focus on that area and see whether there is crime that take place directly next to CCTVs

We create a “sub-map” in the exact same way as we did for the prison in section 3.2.3. In Downtown, we quite clearly see that some crimes (red points) are committed right next to some CCTVs (black triangles).

Downtown_area <-  st_bbox(c(xmin = -8531335.08, xmax = -8526873.06,
                      ymin =4765236.47, ymax = 4762527.65),
                    crs = st_crs(baltimore)) %>% st_as_sfc()
 
Downtown_map <- tm_shape(Downtown_area) + tm_borders(col="white")+ tm_shape(baltimore) + tm_borders(col="black") + tm_layout(inner.margins = 0.05,frame.lwd = 5,main.title = "Zoom on Downtown Area",main.title.position = c('left', 'top'))+tm_scale_bar(position = c("left", "top"))+ tm_shape(balt_dat) + tm_symbols(shape = 2, col = "black", size = 0.07)+tm_shape(crime_spatial)+tm_dots(col="red")

Baltimore_map_2 <- tm_shape(baltimore) + tm_borders()+ tm_shape(Downtown_area) + tm_borders(lwd = 1.5,col = "red") + tm_layout(frame.lwd = 6,inner.margins = 0.05)

tmap_mode("plot")
Downtown_map
print(Baltimore_map_2, vp = viewport(0.8, 0.27, width = 0.5, height = 0.5)) #By running these two lines together, we obtain the map with an additional overview


The fact that we observe crimes committed right next to surveillance cameras reinforces the hypothesis put forward so far and allows us to answer our first research question once and for all: it does not seem that CCTVs reduce crime.

4.2 What types of crimes may be deterred by surveillance cameras?

Although it is impossible to prove the effectiveness of surveillance cameras on crime in general, it is interesting to investigate whether different results are obtained when crime is broken down by type. The methodology we will use is the same as with crime in general. We start with felonies and misdemeanors, then we analyse violent and property crimes.

4.2.1 CCTVs VS Felonies and Misdemeanors

The results of the simple linear regression shows a weak \(R^2\) for both felonies and misdemeanors. The correlation between felony and CCTV density and between misdemeanors and CCTV density is roughly the same. Plotting the observations enables one to see this (weak) tendency. The blue line represents the regression line.

#Felonies

CCTV_VS_Felony <- CCTV_per_area %>% 
  left_join(FelonyStats,by="Community")

regression5 <- lm(CCTV_VS_Felony$FelonyPerCapitaPerArea~CCTV_VS_Felony$density_perc)
CCTV vs Felony
Dependent variable:
Felony (per 1000 inhabitants)
CCTV Density 39.600***
(6.880)
intercept 218.000***
(20.200)
Observations 56
R2 0.380
Adjusted R2 0.369
Residual Std. Error 120.000 (df = 54)
F Statistic 33.100*** (df = 1; 54)
Note: p<0.1; p<0.05; p<0.01



ggplot(data=CCTV_VS_Felony,mapping= aes(x=density_perc,y=FelonyPerCapitaPerArea)) + 
  labs(title = "As CCTV density increases, felony tends to increase as well.", x="CCTV Density ",y="Felony")+
  geom_point() + 
  geom_smooth(method = lm,color="blue",size=0.3)

#Misdemeanors

CCTV_VS_misdemeanors <- CCTV_per_area %>% 
  left_join(MisdemeanorStats,by="Community")

regression6 <- lm(CCTV_VS_misdemeanors$MisdemeanorPerCapitaPerArea~CCTV_VS_misdemeanors$density_perc)
CCTV vs Misdemeanor
Dependent variable:
Misdemeanor (per 1000 inhabitants)
CCTV Density 51.600***
(8.780)
intercept 245.000***
(25.800)
Observations 56
R2 0.390
Adjusted R2 0.379
Residual Std. Error 153.000 (df = 54)
F Statistic 34.500*** (df = 1; 54)
Note: p<0.1; p<0.05; p<0.01



ggplot(data=CCTV_VS_misdemeanors,mapping= aes(x=density_perc,y=MisdemeanorPerCapitaPerArea)) + 
   labs(title = str_wrap(("As CCTV density increases, misdemeanor tends to increase as well."),width = 60), x="CCTV Density ",y="Misdemeanor")+
  geom_point() + 
  geom_smooth(method = lm,color="blue",size=0.3)

We are in the exact same situation as with crime. In order to comment on the effectiveness of CCTVs on felonies and/or misdemeanors, we must analyse felony/misdemeanor evolution over time in areas with/without CCTVs. We start with felonies.

fviz_nbclust(sc.Community_data_Clustering[,c(5,3)], kmeans, method="wss")+
  geom_vline(xintercept = 5, linetype = 2) + # add line for better visualisation
  labs(subtitle = "Elbow method")  #We can determine the optimal number of cluster, 5 clusters seems to be reasonable



set.seed(40)
km.clust5 <- kmeans(sc.Community_data_Clustering[,c(5,3)], 5, nstart = 25) #It is important to specify a nstart parameter. Indeed, if we fail to do so, there is only one choice of random set of rows chosen in the dataset as initial centers.

fviz_cluster(km.clust5, data=sc.Community_data_Clustering[,c(5,3)], repel=TRUE) + labs(x="Felony per capita",y="CCTV density",title = str_wrap(("With 5 clusters, we have a rather good ratio of 87.1% between between and within SS"),width=65),subtitle=str_wrap(("We will focus on areas in the high felony, high CCTV density cluster as well as those in the high felony, low CCTV density cluster"),width=80))


Downtown_Seton_Hill_evolution_Felony <- as.vector(t(Community_data[14,c(73,71,69,67,65,63,61)]))

Downtown_Seton_Hill_Felony <- data.frame(Downtown_Seton_Hill_evolution_Felony,Year)

Downtown_Seton_Hill_map_Felony <- ggplot(Downtown_Seton_Hill_Felony,aes(x=Year,y=Downtown_Seton_Hill_evolution_Felony)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Downtown/Seton Hill",x="Year",y="Felony")

Sandtown_Winchester_Harlem_Park_evolution_Felony <- as.vector(t(Community_data[47,c(73,71,69,67,65,63,61)]))

Sandtown_Winchester_Harlem_Park_Felony <- data.frame(Sandtown_Winchester_Harlem_Park_evolution_Felony,Year)

Sandtown_Winchester_Harlem_Park_map_Felony <- ggplot(Sandtown_Winchester_Harlem_Park_Felony,aes(x=Year,y=Sandtown_Winchester_Harlem_Park_evolution_Felony)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Sandtown-Winchester/Harlem Park",x="Year",y="Felony")

Oldtown_Middle_East_evolution_Felony <- as.vector(t(Community_data[41,c(73,71,69,67,65,63,61)]))

Oldtown_Middle_East_Felony <- data.frame(Oldtown_Middle_East_evolution_Felony,Year)

Oldtown_Middle_East_map_Felony <- ggplot(Oldtown_Middle_East_Felony,aes(x=Year,y=Oldtown_Middle_East_evolution_Felony)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Oldtown/Middle East",x="Year",y="Felony")

Cherry_Hill_evolution_Felony <- as.vector(t(Community_data[7,c(73,71,69,67,65,63,61)]))

Cherry_Hill_Felony <- data.frame(Cherry_Hill_evolution_Felony,Year)

Cherry_Hill_map_Felony <- ggplot(Cherry_Hill_Felony,aes(x=Year,y=Cherry_Hill_evolution_Felony)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Cherry Hill",x="Year",y="Felony")

grid.arrange(Downtown_Seton_Hill_map_Felony,Sandtown_Winchester_Harlem_Park_map_Felony,Oldtown_Middle_East_map_Felony,Cherry_Hill_map_Felony,nrow=2,ncol=2,top="Four areas with HIGH felony and HIGH CCTV density")

Felony_Evolution_VS_CCTV <- Community_data[,c(1,63,73,4)] %>% 
  mutate(change_perc=((Community_data$FelonyPer1000inhabitants19/Community_data$FelonyPer1000inhabitants14)-1)*100)

knitr::kable(Felony_Evolution_VS_CCTV[c(7,14,41,47),c(1,5)],col.names = c('Community','Felony per Capita % change')) %>%
  kable_styling(position = "center")
Community Felony per Capita % change
Cherry Hill 0.305
Downtown/Seton Hill 84.988
Oldtown/Middle East 42.035
Sandtown-Winchester/Harlem Park -5.802



Poppleton_evolution_Felony <- as.vector(t(Community_data[46,c(73,71,69,67,65,63,61)]))

Poppleton_Felony <- data.frame(Poppleton_evolution_Felony,Year)

Poppleton_map_Felony <- ggplot(Poppleton_Felony,aes(x=Year,y=Poppleton_evolution_Felony)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Poppleton/The Terraces/Hollins Market",x="Year",y="Felony")

Madison_East_End_evolution_Felony <- as.vector(t(Community_data[33,c(73,71,69,67,65,63,61)]))

Madison_East_End_Felony <- data.frame(Madison_East_End_evolution_Felony,Year)

Madison_East_End_map_Felony <- ggplot(Madison_East_End_Felony,aes(x=Year,y=Madison_East_End_evolution_Felony)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Madinson/East End",x="Year",y="Felony")

Clifton_Berea_evolution_Felony <- as.vector(t(Community_data[10,c(73,71,69,67,65,63,61)]))

Clifton_Berea_Felony <- data.frame(Clifton_Berea_evolution_Felony,Year)

Clifton_Berea_map_Felony <- ggplot(Clifton_Berea_Felony,aes(x=Year,y=Clifton_Berea_evolution_Felony)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Clifton-Berea",x="Year",y="Felony")

Midway_Coldstream_evolution_Felony <- as.vector(t(Community_data[36,c(73,71,69,67,65,63,61)]))

Midway_Coldstream_Felony <- data.frame(Midway_Coldstream_evolution_Felony,Year)

Midway_Coldstream_map_Felony <- ggplot(Midway_Coldstream_Felony,aes(x=Year,y=Midway_Coldstream_evolution_Felony)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Midway/Coldstream",x="Year",y="Felony")

grid.arrange(Poppleton_map_Felony,Madison_East_End_map_Felony,Clifton_Berea_map_Felony,Midway_Coldstream_map_Felony,nrow=2,ncol=2,top="Four areas with HIGH felony and LOW CCTV density")

knitr::kable(Felony_Evolution_VS_CCTV[c(10,33,36,46),c(1,5)],col.names = c('Community','Felony per Capita % change')) %>%
  kable_styling(position = "center")
Community Felony per Capita % change
Clifton-Berea 38.5
Madison/East End 25.1
Midway/Coldstream 20.8
Poppleton/The Terraces/Hollins Market -14.6

We see that in both clusters, felony has increased for three out of four members, regardless of CCTV density. This might indicate that CCTVs are not effective when it comes to felonies either. By plotting the felony evolution over time in Baltimore, we observe that felony has also slightly increased comparing the 2014 and the 2019 level. Again, it makes it difficult to interpret these results as they are not statistically significant. Still, computing a regression between CCTV density and felony per capita evolution gives the intuition that there only exists a very weak correlation between CCTV density and felony per capita evolution (\(R^2\) is equal to 8.06%), confirming the idea that surveillance cameras are not effective in preventing felonies either.

Felony_Yearly_evolution_map <- crime_data_with_areas %>%
  filter(Category=="Felony") %>% 
  count(year=floor_date(CrimeDateTime,"year")) %>% 
  ggplot(aes(year,n))+geom_line()+
  scale_x_date(limits = c(as.Date("2014-01-01"), as.Date("2020-12-31"))) +
  labs(title = str_wrap(("Comparing the 2014 felony level with the 2019 felony level, one observes that felony has slightly increased"),width=65),subtitle=str_wrap(("This further explains why we should be careful when considering CCTV effectiveness in deterring felonies"),width=80),x="Year",y="Felony occurences")

Felony_Yearly_evolution_map


CCTV density VS Felony evolution
Dependent variable:
Felony per Capita % change
CCTV Density 3.320**
(1.490)
intercept 6.780
(4.410)
Observations 55
R2 0.086
Adjusted R2 0.069
Residual Std. Error 25.900 (df = 53)
F Statistic 4.990** (df = 1; 53)
Note: p<0.1; p<0.05; p<0.01


We continue with misdemeanors. We do the exact same thing.

fviz_nbclust(sc.Community_data_Clustering[,c(6,3)], kmeans, method="wss")+
  geom_vline(xintercept = 5, linetype = 2) + # add line for better visualisation
  labs(subtitle = "Elbow method")  #We can determine the optimal number of cluster, 5 clusters seems to be reasonable



set.seed(50)
km.clust6 <- kmeans(sc.Community_data_Clustering[,c(6,3)], 5, nstart = 25) #It is important to specify a nstart parameter. Indeed, if we fail to do so, there is only one choice of random set of rows chosen in the dataset as initial centers.
fviz_cluster(km.clust6, data=sc.Community_data_Clustering[,c(6,3)], repel=TRUE) + labs(x="Misdemeanor per capita",y="CCTV density",title = str_wrap(("With 5 clusters, we have a rather good ratio of 84.41% between between and within SS"),width=65),subtitle=str_wrap(("We will focus on areas in the high misdemeanor, high CCTV density cluster as well as those in the high misdemeanor, low CCTV density cluster"),width=80))


Downtown_Seton_Hill_evolution_Misdemeanor <- as.vector(t(Community_data[14,c(89,87,85,83,81,79,77)]))

Downtown_Seton_Hill_Misdemeanor <- data.frame(Downtown_Seton_Hill_evolution_Misdemeanor,Year)

Downtown_Seton_Hill_map_Misdemeanor <- ggplot(Downtown_Seton_Hill_Misdemeanor,aes(x=Year,y=Downtown_Seton_Hill_evolution_Misdemeanor)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 150), breaks =seq(0,150,25)) + 
  labs(title = "Downtown/Seton Hill",x="Year",y="Misdemeanor")

Oldtown_Middle_East_evolution_Misdemeanor <- as.vector(t(Community_data[41,c(89,87,85,83,81,79,77)]))

Oldtown_Middle_East_Misdemeanor <- data.frame(Oldtown_Middle_East_evolution_Misdemeanor,Year)

Oldtown_Middle_East_map_Misdemeanor <- ggplot(Oldtown_Middle_East_Misdemeanor,aes(x=Year,y=Oldtown_Middle_East_evolution_Misdemeanor)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 150), breaks =seq(0,150,25)) + 
  labs(title = "Oldtown/Middle East",x="Year",y="Misdemeanor")

Upton_Druid_Heights_evolution_Misdemeanor <- as.vector(t(Community_data[53,c(89,87,85,83,81,79,77)]))

Upton_Druid_Heights_Misdemeanor <- data.frame(Upton_Druid_Heights_evolution_Misdemeanor,Year)

Upton_Druid_Heights_map_Misdemeanor <- ggplot(Upton_Druid_Heights_Misdemeanor,aes(x=Year,y=Upton_Druid_Heights_evolution_Misdemeanor)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 150), breaks =seq(0,150,25)) + 
  labs(title = "Upton/Druid Heights",x="Year",y="Misdemeanor")

Sandtown_Winchester_Harlem_Park_evolution_Misdemeanor <- as.vector(t(Community_data[47,c(89,87,85,83,81,79,77)]))

Sandtown_Winchester_Harlem_Park_Misdemeanor <- data.frame(Sandtown_Winchester_Harlem_Park_evolution_Misdemeanor,Year)

Sandtown_Winchester_Harlem_Park_map_Misdemeanor <- ggplot(Sandtown_Winchester_Harlem_Park_Misdemeanor,aes(x=Year,y=Sandtown_Winchester_Harlem_Park_evolution_Misdemeanor)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 150), breaks =seq(0,150,25)) + 
  labs(title = "Sandtown-Winchester/Harlem-Park",x="Year",y="Misdemeanor")

grid.arrange(Downtown_Seton_Hill_map_Misdemeanor,Oldtown_Middle_East_map_Misdemeanor,Upton_Druid_Heights_map_Misdemeanor,Sandtown_Winchester_Harlem_Park_map_Misdemeanor,nrow=2,ncol=2,top="Four areas with HIGH misdemeanor and HIGH CCTV density")

Misdemeanor_Evolution_VS_CCTV <- Community_data[,c(1,79,89,4)] %>% 
  mutate(change_perc=((Community_data$MisdemeanorPer1000inhabitants19/Community_data$MisdemeanorPer1000inhabitants14)-1)*100)

knitr::kable(Misdemeanor_Evolution_VS_CCTV[c(14,41,47,53),c(1,5)],col.names = c('Community','Misdemeanor per Capita % change')) %>% kable_styling(position = "center")
Community Misdemeanor per Capita % change
Downtown/Seton Hill 21.88
Oldtown/Middle East 14.96
Sandtown-Winchester/Harlem Park -10.69
Upton/Druid Heights 7.74



Washington_Village_Pigtown_evolution_Misdemeanor <- as.vector(t(Community_data[54,c(89,87,85,83,81,79,77)]))

Washington_Village_Pigtown_Misdemeanor <- data.frame(Washington_Village_Pigtown_evolution_Misdemeanor,Year)

Washington_Village_Pigtown_map_Misdemeanor <- ggplot(Washington_Village_Pigtown_Misdemeanor,aes(x=Year,y=Washington_Village_Pigtown_evolution_Misdemeanor)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 150), breaks =seq(0,150,25)) + 
  labs(title = "Washington Village/Pigtown",x="Year",y="Misdemeanor")

Harbor_East_Little_Italy_evolution_Misdemeanor <- as.vector(t(Community_data[26,c(89,87,85,83,81,79,77)]))

Harbor_East_Little_Italy_Misdemeanor <- data.frame(Harbor_East_Little_Italy_evolution_Misdemeanor,Year)

Harbor_East_Little_Italy_map_Misdemeanor <- ggplot(Harbor_East_Little_Italy_Misdemeanor,aes(x=Year,y=Harbor_East_Little_Italy_evolution_Misdemeanor)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 150), breaks =seq(0,150,25)) + 
  labs(title = "Harbor East/Little Italy",x="Year",y="Misdemeanor")

Canton_evolution_Misdemeanor <- as.vector(t(Community_data[5,c(89,87,85,83,81,79,77)]))

Canton_Misdemeanor <- data.frame(Canton_evolution_Misdemeanor,Year)

Canton_map_Misdemeanor <- ggplot(Canton_Misdemeanor,aes(x=Year,y=Canton_evolution_Misdemeanor)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 150), breaks =seq(0,150,25)) + 
  labs(title = "Canton",x="Year",y="Misdemeanor")

Penn_North_Reservoir_Hill_evolution_Misdemeanor <- as.vector(t(Community_data[44,c(89,87,85,83,81,79,77)]))

Penn_North_Reservoir_Hill_Misdemeanor <- data.frame(Penn_North_Reservoir_Hill_evolution_Misdemeanor,Year)

Penn_North_Reservoir_Hill_map_Misdemeanor <- ggplot(Penn_North_Reservoir_Hill_Misdemeanor,aes(x=Year,y=Penn_North_Reservoir_Hill_evolution_Misdemeanor)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 150), breaks =seq(0,150,25)) + 
  labs(title = "Penn North/Reservoir Hill",x="Year",y="Misdemeanor")

grid.arrange(Washington_Village_Pigtown_map_Misdemeanor,Harbor_East_Little_Italy_map_Misdemeanor,Canton_map_Misdemeanor,Penn_North_Reservoir_Hill_map_Misdemeanor,nrow=2,ncol=2,top="Four areas with HIGH misdemeanor and LOW CCTV density")

knitr::kable(Misdemeanor_Evolution_VS_CCTV[c(5,26,44,54),c(1,5)],col.names = c('Community','Misdemeanor per Capita % change')) %>%
  kable_styling(position = "center")
Community Misdemeanor per Capita % change
Canton -0.904
Harbor East/Little Italy -45.609
Penn North/Reservoir Hill 5.405
Washington Village/Pigtown 7.156

Misdemenaors have decreased more often in areas belonging to the low CCTV density cluster. Again, this could potentially suggest that surveillance cameras are not very effective in reducing misdemeanors. Moreover, there is a downward trend in terms of the number of crimes in the whole city. However, these results should be taken with caution as, again, they are not statistically significant and could be due to chance. In order to get a better overall picture of the relationship between CCTV density and misdemeanor per capita, we performed a regression between CCTV density and misdemeanor change. The correlation between the two variables is very weak, suggesting once again that surveillance cameras are ineffective against misdemeanors.

Misdemeanor_Yearly_evolution_map <- crime_data_with_areas %>%
  filter(Category=="Misdemeanor") %>% 
  count(year=floor_date(CrimeDateTime,"year")) %>% 
  ggplot(aes(year,n))+geom_line()+
  scale_x_date(limits = c(as.Date("2014-01-01"), as.Date("2020-12-31"))) +
  labs(title = "In Baltimore, Misdemeanor started to decrease as from 2017",subtitle=str_wrap(("This explains why we should be careful when considering CCTV effectiveness in deterring misdemenors"),width=75),x="Year",y="Misdemeanor occurences")

Misdemeanor_Yearly_evolution_map

CCTV density VS Misdemeanor evolution
Dependent variable:
Misdemeanor per Capita % change
CCTV Density 0.293
(1.210)
intercept -1.280
(3.590)
Observations 55
R2 0.001
Adjusted R2 -0.018
Residual Std. Error 21.100 (df = 53)
F Statistic 0.058 (df = 1; 53)
Note: p<0.1; p<0.05; p<0.01



4.2.2 CCTVs VS Violent and Property Crime

It seems that the severity of the crime (i. e. whether it is a felony or a misdemeanor) does not influence the crime-deterring potential of CCTVs. It is now interesting to see whether we observe a difference when decomposing crime into violent and property crime. We first perform one simple linear regression between CCTV density and violent crime and another between CCTV density and property crime. The first interesting thing one can observe is that we observe a much stronger correlation between between CCTV density and violent crime than between CCTV density and property crime. The \(R^2\) of the first regression is at 51.5%, more than any other \(R^2\) obtained until now. This suggests that violent crime per capita is a considerable determinant of CCTV density.

CCTV_VS_ViolentCrime <- CCTV_per_area %>% 
  left_join(ViolentStats,by="Community")

regression21 <- lm(CCTV_VS_ViolentCrime$ViolentCrimePerCapitaPerArea~CCTV_VS_ViolentCrime$density_perc)
CCTV vs Violent Crime
Dependent variable:
Violent Crime (per 1000 inhabitants)
CCTV Density 53.700***
(7.080)
intercept 205.000***
(20.800)
Observations 56
R2 0.516
Adjusted R2 0.507
Residual Std. Error 124.000 (df = 54)
F Statistic 57.600*** (df = 1; 54)
Note: p<0.1; p<0.05; p<0.01



ggplot(Community_data,aes(x=density_perc,y=ViolentCrimePerCapitaPerArea))+
  geom_point()+
   geom_smooth(method = lm,color="blue",size=0.3)+
  labs(x="CCTV Density per Area",y="Violent crime (per 1000 inhabitants)",title="Violent crime increases with CCTV density",subtitle = str_wrap("We observe that the lack of data about CCTV in some areas might influence our results."),width=45)



CCTV_VS_PropertyCrime <- CCTV_per_area %>% 
  left_join(PropertyStats,by="Community")

regression22 <- lm(CCTV_VS_PropertyCrime$PropertyCrimePerCapitaPerArea~CCTV_VS_PropertyCrime$density_perc)
CCTV vs Property Crime
Dependent variable:
Property Crime (per 1000 inhabitants)
CCTV Density 37.400***
(8.100)
intercept 258.000***
(23.800)
Observations 56
R2 0.284
Adjusted R2 0.270
Residual Std. Error 141.000 (df = 54)
F Statistic 21.400*** (df = 1; 54)
Note: p<0.1; p<0.05; p<0.01



ggplot(Community_data,aes(x=density_perc,y=PropertyCrimePerCapitaPerArea))+
  geom_point()+
   geom_smooth(method = lm,color="blue",size=0.3)+
  labs(x="CCTV Density per Area",y="Property crime (per 1000 inhabitants)",title="Property crime is slightly correlated with CCTV intensity",subtitle = str_wrap("The correlation is weaker than between violent crime and CCTV density"),width=45)

Yet, as in past sections, we have to analyse how violent and property crime evolved over time in areas with different CCTV densities to be in position to comment on CCTV effectiveness. We start with violent crime.

fviz_nbclust(sc.Community_data_Clustering[,c(7,3)], kmeans, method="wss")+
  geom_vline(xintercept = 5, linetype = 2) + # add line for better visualisation
  labs(subtitle = "Elbow method")  #We can determine the optimal number of cluster, 5 clusters seems to be reasonable



set.seed(20)
km.clust3 <- kmeans(sc.Community_data_Clustering[,c(7,3)], 5, nstart = 25) #It is important to specify a nstart parameter. Indeed, if we fail to do so, there is only one choice of random set of rows chosen in the dataset as initial centers.
fviz_cluster(km.clust3, data=sc.Community_data_Clustering[,c(7,3)], repel=TRUE) + labs(x="Violent crime per capita",y="CCTV density",title = str_wrap(("With 5 clusters, we have a good ratio of 87.4% between between and within SS"),width=65),subtitle=str_wrap(("We will focus on areas in the high violent crime, high CCTV density cluster as well as those in the high violent crime, low CCTV density cluster"),width=80))



Downtown_Seton_Hill_evolution_Violent <- as.vector(t(Community_data[14,c(41,39,37,35,33,31,29)]))

Downtown_Seton_Hill_Violent <- data.frame(Downtown_Seton_Hill_evolution_Violent,Year)

Downtown_Seton_Hill_map_Violent <- ggplot(Downtown_Seton_Hill_Violent,aes(x=Year,y=Downtown_Seton_Hill_evolution_Violent)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Downtown/Seton Hill",x="Year",y="Violent Crime")

Oldtown_Middle_East_evolution_Violent <- as.vector(t(Community_data[41,c(41,39,37,35,33,31,29)]))

Oldtown_Middle_East_Violent <- data.frame(Oldtown_Middle_East_evolution_Violent,Year)

Oldtown_Middle_East_map_Violent <- ggplot(Oldtown_Middle_East_Violent,aes(x=Year,y=Oldtown_Middle_East_evolution_Violent)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Oldtown/Middle East",x="Year",y="Violent Crime")

Sandtown_Winchester_Harlem_Park_evolution_Violent <- as.vector(t(Community_data[47,c(41,39,37,35,33,31,29)]))

Sandtown_Winchester_Harlem_Park_Violent <- data.frame(Sandtown_Winchester_Harlem_Park_evolution_Violent,Year)

Sandtown_Winchester_Harlem_Park_map_Violent <- ggplot(Sandtown_Winchester_Harlem_Park_Violent,aes(x=Year,y=Sandtown_Winchester_Harlem_Park_evolution_Violent)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Sandtown-Winchester/Harlem Park",x="Year",y="Violent Crime")

Cherry_Hill_evolution_Violent <- as.vector(t(Community_data[7,c(41,39,37,35,33,31,29)]))

Cherry_Hill_Violent <- data.frame(Cherry_Hill_evolution_Violent,Year)

Cherry_Hill_map_Violent <- ggplot(Cherry_Hill_Violent,aes(x=Year,y=Cherry_Hill_evolution_Violent)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Cherry Hill",x="Year",y="Violent Crime")

library(pdp)

grid.arrange(Downtown_Seton_Hill_map_Violent,Oldtown_Middle_East_map_Violent,Sandtown_Winchester_Harlem_Park_map_Violent,Cherry_Hill_map_Violent,nrow=2,ncol=2,top="Four areas with HIGH violent crime and HIGH CCTV density")

Violent_Crime_Evolution_VS_CCTV <- Community_data[,c(1,31,41,4)] %>% 
  mutate(change_perc=((Community_data$ViolentCrimePer1000inhabitants19/Community_data$ViolentCrimePer1000inhabitants14)-1)*100)

Violent_Crime_Evolution_VS_CCTV[c(7,14,41,47),c(-3)]
#> # A tibble: 4 x 4
#>   Community           ViolentCrimePer1000in~ density_perc change_perc
#>   <chr>                                <dbl>        <dbl>       <dbl>
#> 1 Cherry Hill                           49.0         7.06       13.6 
#> 2 Downtown/Seton Hill                  123.         10.0        57.0 
#> 3 Oldtown/Middle East                   84.3         7.66       34.0 
#> 4 Sandtown-Wincheste~                   59.4         7.42       -4.86
knitr::kable(Violent_Crime_Evolution_VS_CCTV[c(7,14,41,47),c(1,5)],col.names = c('Community','Violent Crime per Capita % change')) %>% kable_styling(position = "center")
Community Violent Crime per Capita % change
Cherry Hill 13.62
Downtown/Seton Hill 57.03
Oldtown/Middle East 33.97
Sandtown-Winchester/Harlem Park -4.86



Clifton_Berea_evolution_Violent <- as.vector(t(Community_data[10,c(41,39,37,35,33,31,29)]))

Clifton_Berea_Violent <- data.frame(Clifton_Berea_evolution_Violent,Year)

Clifton_Berea_map_Violent <- ggplot(Clifton_Berea_Violent,aes(x=Year,y=Clifton_Berea_evolution_Violent)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Clifton-Berea",x="Year",y="Violent Crime")

Pimlico_Arlington_Hilltop_evolution_Violent <- as.vector(t(Community_data[45,c(41,39,37,35,33,31,29)]))

Pimlico_Arlington_Hilltop_Violent <- data.frame(Pimlico_Arlington_Hilltop_evolution_Violent,Year)

Pimlico_Arlington_Hilltop_map_Violent <- ggplot(Pimlico_Arlington_Hilltop_Violent,aes(x=Year,y=Pimlico_Arlington_Hilltop_evolution_Violent)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Pimlico/Arlington/Hilltop",x="Year",y="Violent Crime")

Midway_Coldstream_evolution_Violent <- as.vector(t(Community_data[36,c(41,39,37,35,33,31,29)]))

Midway_Coldstream_Violent <- data.frame(Midway_Coldstream_evolution_Violent,Year)

Midway_Coldstream_map_Violent <- ggplot(Midway_Coldstream_Violent,aes(x=Year,y=Midway_Coldstream_evolution_Violent)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Midway/Coldstream",x="Year",y="Violent Crime")

Madison_East_End_evolution_Violent <- as.vector(t(Community_data[33,c(41,39,37,35,33,31,29)]))

Madison_East_End_Violent <- data.frame(Madison_East_End_evolution_Violent,Year)

Madison_East_End_map_Violent <- ggplot(Madison_East_End_Violent,aes(x=Year,y=Madison_East_End_evolution_Violent)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Madison/East End",x="Year",y="Violent Crime")

grid.arrange(Clifton_Berea_map_Violent,Pimlico_Arlington_Hilltop_map_Violent,Midway_Coldstream_map_Violent,Madison_East_End_map_Violent,nrow=2,ncol=2,top="Four areas with HIGH violent crime and LOW CCTV density")

knitr::kable(Violent_Crime_Evolution_VS_CCTV[c(10,33,36,45),c(1,5)],col.names = c('Community','Violent Crime per Capita % change')) %>% kable_styling(position = "center")
Community Violent Crime per Capita % change
Clifton-Berea 26.63
Madison/East End 21.34
Midway/Coldstream 20.14
Pimlico/Arlington/Hilltop 5.14

This time, it is in the group of areas with the highest concentration of cameras that violent crime has fallen most often. However, before jumping to conclusions, it is worth mentioning that only one neighbourhood in this group has decreased and that, as usual, this result could be due to luck. It should also be noted that overall, violent crime in Baltimore has increased. The simple linear regression performed between CCTV density and violent crime evolution do not indicate any correlation between the two variables, thus leading us to the conclusion that CCTVs do not seem to be effective in reducing violent crime etiher.

Violent_Crime_Yearly_evolution_map <- crime_data_with_areas %>%
  filter(VIO_PROP_CFS=="VIOLENT") %>% 
  count(year=floor_date(CrimeDateTime,"year")) %>% 
  ggplot(aes(year,n))+geom_line()+
  scale_x_date(limits = c(as.Date("2014-01-01"), as.Date("2020-12-31"))) +
  labs(title = str_wrap(("Overall, violent crime has increased in Baltimore over the 2014-2019 period"),width=65),x="Year",y="Violent crime occurences")

Violent_Crime_Yearly_evolution_map

regression8 <- lm(Violent_Crime_Evolution_VS_CCTV$change_perc~Violent_Crime_Evolution_VS_CCTV$density_perc)
CCTV density VS Violent Crime evolution
Dependent variable:
Violent Crime per Capita % change
CCTV Density 0.312
(1.770)
intercept 26.600***
(5.250)
Observations 55
R2 0.001
Adjusted R2 -0.018
Residual Std. Error 30.900 (df = 53)
F Statistic 0.031 (df = 1; 53)
Note: p<0.1; p<0.05; p<0.01


The last type of crime we must consider is property crime.

fviz_nbclust(sc.Community_data_Clustering[,c(8,3)], kmeans, method="wss")+
  geom_vline(xintercept = 5, linetype = 2) + # add line for better visualisation
  labs(subtitle = "Elbow method")  #We can determine the optimal number of cluster, 5 clusters seems to be reasonable



set.seed(30)
km.clust4 <- kmeans(sc.Community_data_Clustering[,c(8,3)], 5, nstart = 25) #It is important to specify a nstart parameter. Indeed, if we fail to do so, there is only one choice of random set of rows chosen in the dataset as initial centers.
fviz_cluster(km.clust4, data=sc.Community_data_Clustering[,c(8,3)], repel=TRUE) + labs(x="Property crime per capita",y="CCTV density",title = str_wrap(("With 5 clusters, we have a rather good ratio of 82.8% between between and within SS"),width=65),subtitle=str_wrap(("We will focus on areas in the high property crime, high CCTV density cluster as well as those in the high property crime, low CCTV density cluster"),width=80))



Downtown_Seton_Hill_evolution_Property <- as.vector(t(Community_data[14,c(57,55,53,51,49,47,45)]))

Downtown_Seton_Hill_Property <- data.frame(Downtown_Seton_Hill_evolution_Property,Year)

Downtown_Seton_Hill_map_Property <- ggplot(Downtown_Seton_Hill_Property,aes(x=Year,y=Downtown_Seton_Hill_evolution_Property)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Downtown/Seton Hill",x="Year",y="Property Crime")

Oldtown_Middle_East_evolution_Property <- as.vector(t(Community_data[41,c(57,55,53,51,49,47,45)]))

Oldtown_Middle_East_Property <- data.frame(Oldtown_Middle_East_evolution_Property,Year)

Oldtown_Middle_East_map_Property <- ggplot(Oldtown_Middle_East_Property,aes(x=Year,y=Oldtown_Middle_East_evolution_Property)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Oldtown/Middle East",x="Year",y="Property Crime")

Sandtown_Winchester_Harlem_Park_evolution_Property <- as.vector(t(Community_data[47,c(57,55,53,51,49,47,45)]))

Sandtown_Winchester_Harlem_Park_Property <- data.frame(Sandtown_Winchester_Harlem_Park_evolution_Property,Year)

Sandtown_Winchester_Harlem_Park_map_Property <- ggplot(Sandtown_Winchester_Harlem_Park_Property,aes(x=Year,y=Sandtown_Winchester_Harlem_Park_evolution_Property)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Sandtown-Winchester/Harlem Park",x="Year",y="Property Crime")

Uptown_Druid_Heights_evolution_Property <- as.vector(t(Community_data[53,c(57,55,53,51,49,47,45)]))

Uptown_Druid_Heights_Property <- data.frame(Uptown_Druid_Heights_evolution_Property,Year)

Uptown_Druid_Heights_map_Property <- ggplot(Uptown_Druid_Heights_Property,aes(x=Year,y=Uptown_Druid_Heights_evolution_Property)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "SUptown/Druid Heights",x="Year",y="Property Crime")

grid.arrange(Downtown_Seton_Hill_map_Property,Oldtown_Middle_East_map_Property,Sandtown_Winchester_Harlem_Park_map_Property,Uptown_Druid_Heights_map_Property,nrow=2,ncol=2,top="Four areas with HIGH property crime and HIGH CCTV density")

Property_Crime_Evolution_VS_CCTV <- Community_data[,c(1,57,47,4)] %>% 
  mutate(change_perc=((Community_data$PropertyCrimePer1000inhabitants19/Community_data$PropertyCrimePer1000inhabitants14)-1)*100)

knitr::kable(Property_Crime_Evolution_VS_CCTV[c(14,41,47,53),c(1,5)],col.names = c('Community','Property Crime per Capita % change')) %>% kable_styling(position = "center")
Community Property Crime per Capita % change
Downtown/Seton Hill 23.4
Oldtown/Middle East 15.4
Sandtown-Winchester/Harlem Park -11.8
Upton/Druid Heights 4.3



Washington_Village_Pigtown_evolution_Property <- as.vector(t(Community_data[54,c(57,55,53,51,49,47,45)]))

Washington_Village_Pigtown_Property <- data.frame(Washington_Village_Pigtown_evolution_Property,Year)

Washington_Village_Pigtown_map_Property <- ggplot(Washington_Village_Pigtown_Property,aes(x=Year,y=Washington_Village_Pigtown_evolution_Property)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Washington Village/Pigtown",x="Year",y="Property Crime")

Harbor_East_Little_Italy_evolution_Property <- as.vector(t(Community_data[26,c(57,55,53,51,49,47,45)]))

Harbor_East_Little_Italy_Property <- data.frame(Harbor_East_Little_Italy_evolution_Property,Year)

Harbor_East_Little_Italy_map_Property <- ggplot(Harbor_East_Little_Italy_Property,aes(x=Year,y=Harbor_East_Little_Italy_evolution_Property)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Harbor East/Little Italy",x="Year",y="Property Crime")

Canton_evolution_Property <- as.vector(t(Community_data[5,c(57,55,53,51,49,47,45)]))

Canton_Property <- data.frame(Canton_evolution_Property,Year)

Canton_map_Property <- ggplot(Canton_Property,aes(x=Year,y=Canton_evolution_Property)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Canton",x="Year",y="Property Crime")

Southeastern_evolution_Property <- as.vector(t(Community_data[49,c(57,55,53,51,49,47,45)]))

Southeastern_Property <- data.frame(Southeastern_evolution_Property,Year)

Southeastern_map_Property <- ggplot(Southeastern_Property,aes(x=Year,y=Southeastern_evolution_Property)) + 
  geom_line() +
  scale_y_continuous(limits = c(0, 125), breaks =seq(0,125,25)) + 
  labs(title = "Southeastern",x="Year",y="Property Crime")

grid.arrange(Washington_Village_Pigtown_map_Property,Harbor_East_Little_Italy_map_Property,Canton_map_Property,Southeastern_map_Property,nrow=2,ncol=2,top="Four areas with HIGH property crime and LOW CCTV density")

knitr::kable(Property_Crime_Evolution_VS_CCTV[c(5,26,49,54),c(1,5)],col.names = c('Community','Property Crime per Capita % change')) %>% kable_styling(position = "center")
Community Property Crime per Capita % change
Canton 11.29
Harbor East/Little Italy -49.67
Southeastern 5.23
Washington Village/Pigtown -5.38

With regard to property crime, it more often increaed in areas with high CCTV density. This is despite the fact that property crime has declined city-wide. Again, these results are not statistically significant. They simply allow us to see if a trend is observable. However, when looking at the regression results between CCTV density and property crime evolution, we again find a lack of correlation confirming that CCTVs do not seem to be effective in reducing property crime.

Property_Crime_Yearly_evolution_map <- crime_data_with_areas %>%
  filter(VIO_PROP_CFS=="PROPERTY") %>% 
  count(year=floor_date(CrimeDateTime,"year")) %>% 
  ggplot(aes(year,n))+geom_line()+
  scale_x_date(limits = c(as.Date("2014-01-01"), as.Date("2020-12-31"))) +
  labs(title = str_wrap(("Overall, property crime seems to have decreased for the 2017 to 2020 period"),width=60),subtitle=str_wrap(("This explains why we should be careful when considering CCTV effectiveness in deterring property crime"),width=75),x="Year",y="Property crime occurences")

Property_Crime_Yearly_evolution_map

regression9 <- lm(Property_Crime_Evolution_VS_CCTV$change_perc~Property_Crime_Evolution_VS_CCTV$density_perc)


CCTV density VS Property Crime evolution
Dependent variable:
Property Crime per Capita % change
CCTV Density 1.330
(1.290)
intercept -12.400***
(3.830)
Observations 55
R2 0.020
Adjusted R2 0.001
Residual Std. Error 22.500 (df = 53)
F Statistic 1.070 (df = 1; 53)
Note: p<0.1; p<0.05; p<0.01



None of the comparisons between CCTV density and different crime type evolution seem to indicate significant relationships. Thus, it seems reasonable to conclude that in addition to being ineffective in reducing crime in general, CCTVs do not seem particularly effective in preventing any particular type of crime. However, from this analysis we found that violent crime appears to be an important determinant of CCTV density. This finding will be used to answer our fourth research question.

4.3 Research question 3 - Is the impact of CCTV on crime reduction higher/lower/same in higher income neighborhoods compared to lower income neighborhoods?

As the actual effectiveness of CCTV is not so clear, answering our third research question does not really make sense. Yet what we can still investigate is how crime has evolved in each area based on its poverty level. Indeed, it is interesting to see whether, for example, inequalities in terms of security between richer and poorer areas appeared over time.

Evolution_VS_poverty <- Community_data[,c(25,15,5)] %>% 
  mutate(change=(Community_data$CrimePer1000inhabitants19/Community_data$CrimePer1000inhabitants14)-1)


ggplot(data=Evolution_VS_poverty,mapping= aes(x=change,y=hhpov19))+
  geom_point()+
  geom_smooth(method = lm,color="blue",size=0.3)+
  labs(x="Change in crime per capita between 2014 and 2019",y="% of people living below the poverty line",title = str_wrap(("Change in crime per capita and poverty do not really seem to be correlated"),width = 60),subtitle = str_wrap(("Altough the regression line might indicate a sort tendency indicating that crime might have reduced more in richer areas compared to poorer areas, the regression output below indicates a very poor correlation"),width = 70))

Evolution_VS_poverty_regr <- lm(Evolution_VS_poverty$hhpov19~Evolution_VS_poverty$change)



Crime evolution VS poverty level
Dependent variable:
Poverty level
Crime per Capita % change 14.400*
(7.520)
intercept 16.200***
(1.490)
Observations 55
R2 0.065
Adjusted R2 0.047
Residual Std. Error 10.900 (df = 53)
F Statistic 3.670* (df = 1; 53)
Note: p<0.1; p<0.05; p<0.01



The very poor \(R^2\) (6.48%) indicates that there is a poor correlation between the change in crime per capita between 2014 and 2019 and the poverty metric used. Inequalities in terms of security do not seem to have appeared.

4.4 Research question 4 - Are there more public cameras in poorer areas compared to wealthier areas?

We want to see whether there is a correlation between CCTV density and poverty level As explained in the introduction, CCTVs pose several privacy issues. One of our initial hypothesis was that the government respected more the privacy of wealthier people. So, similarly, we perform a simple linear regression. The results here are not so conclusive, since we have a poor \(adjusted R^2\) (24.4%) and a poor \(R^2\) (25.8%). The next sub-section illustrates this phenomenon in a map.

CCTV_VS_poverty <- CCTV_per_area %>% 
  left_join(poverty_data,by=c("Community"="CSA2010"))

regression2 <- lm(CCTV_VS_poverty$density_perc~CCTV_VS_poverty$hhpov19)

CCTV vs poverty
Dependent variable:
CCTV Density
Poverty level 0.106***
(0.024)
intercept 0.052
(0.485)
Observations 56
R2 0.258
Adjusted R2 0.244
Residual Std. Error 2.050 (df = 54)
F Statistic 18.700*** (df = 1; 54)
Note: p<0.1; p<0.05; p<0.01


ggplot(data=CCTV_VS_poverty,mapping= aes(x=hhpov19,y=density_perc)) + 
  labs(title = str_wrap(("As poverty level incrases, one tend to see subtle tendency for CCTV density to increase as well"),width = 65), x="% of people living below the poverty line",y="CCTV Density")+
  geom_point() + 
  geom_smooth(method = lm,color="blue",size=0.3)



4.4.1 Mapping of CCTVs and wealth

Again, looking at things from a map perspective is always interesting. The methodology to create the map is always the same: we ensure a perfect match, then merge the data using left_join and finally create the map using tmap. While the simple linear regression was not so conclusive, it seems like the map enables one to grasp interesting patterns and partially explain why we obtained such a poor result. First thing we see it that areas with no CCTVs often are quite wealthy (e.g. in the northern part of Baltimore). The map also enables one to see that many CCTVs are located in Downtown/Seton Hill and in Inner Harbor/Federal Hill, namely in the center of Baltimore and that the poverty level is very low in these two regions. This makes sense as centers of city are typically not a city’s poorest areas. This might have contributed to the poor \(R^2\) we obtained before.

baltimore$community %in% poverty_data$CSA2010 #We see that we have a perfect match

baltimore@data <- left_join(baltimore@data, poverty_data, by = c('community' = 'CSA2010'))
Wealth_and_CCTV_map <- tm_shape(baltimore)+tm_fill(col = "hhpov19", title ="% of people living below the poverty line",style = "quantile") + tm_borders(col="black",alpha=0.3) + tm_layout(inner.margins = 0.05)+ tm_shape(balt_dat) + tm_dots(col="black") 

We see a high concentration of CCTVs in the centre.


4.4.2 An attempt to understand what determines CCTV density

We have seen that poverty level does not really seem to be a major determinant of CCTV density in an area. Therefore, we decided to built a multiple regression model containing various variables to investigate what could actually be the main determinant of CCTV density. This enables us to see whether city officials make decisions regarding where to implement CCTVs based on, for example, socio-economic factors, educational factors or racial factors. We retrieved six new data sets on Baltimore’s open data portal. All these data sets were already tidied and constructed in the same way: they contained one observation for each area for several time periods. The first one provides information about the percent of persons between the ages of 16 and 64 who are currently unemployed. The second one provides information about the percentage of households without an internet subscription at home. Then, we retrieved some data sets about education. HSAexam data set provides information about the percentage of high school students who have successfully passed the H.S.A. exams while Readiness indicates percentage of children whose composite score indicates full school readiness out of all kindergarten school children tested within an area in a school year. Finally, we retrieved two data sets regarding the total number of persons that identify themselves as being racially white, respectively black or African American, out of the total number of persons living in an area.

After importing all the data sets, we created an object called fullmod which is a multiple regression containing 8 independent variables to predict CCTV density: the six above-mentioned data sets, poverty level and violent crime per capita. We also create nullmod, this model only contain an intercept at 1. Having these two models, we performed a forward stepwise regression based on AIC criteria using step. This function starts with the null model, which is technically the “worst” and “simplest” model we can imagine and adds independent variables until we are better off not adding anything (meaning: when adding a variable increases the AIC score). We then use vif to check whether we have a mutlicollinearity issue. This happens when one or several variable explain the same variance. A VIF score above 5 is considered severe multicollinearity. In the case at hand, two independent variables have a VIF score above 5: the two racial indicators seem to explain the same variance. By creating a simple linear regression between percentage of white people and percentage of black people, we indeed see that these two variables explain each other very well. Finally, we also tried to plot a linear regression between percentage of white people and CCTV density and saw that this really seemed to be a poor predictor of CCTV density. We therefore got rid of these two racial metrics and performed another stepwise regression. The results of this stepwise regression indicate that only violent crime, unemployment, HSA passing and poverty level should be included as independent variables and that we should not even include the rest. We have no multicollinearity this time and obtain a \(R^2\) of 58.6%. Yet we also observe that out of the four partial regression coefficients, only two of them are statistically significant from zero: violent crime and unemployment. Therefore, the very last thing we did was to perform another multiple linear regression solely containing these two variables. When doing so, we obtain a satisfying \(R^2\) of 53.1%, yet, unemployment in turn becomes insignificant.

Source of the data sets:

[https://arcg.is/0KOH1q]
[https://arcg.is/1znCqm0]
[https://arcg.is/1LiHnn0]
[https://arcg.is/1nOK9X]
[https://arcg.is/05LyTO]
[https://arcg.is/1vqyTu0]

#Socio-economic indicators:

Unemployment <- read.csv(file = here::here("data/Unemployment_Rate.csv"))

Unemployment[56,] <- list(56,"Unassigned -- Jail",0,0,0,0,0,0,0,0,0,0,0,0)

Internet_data <- read.csv(file = here::here("data/Percent_of_Households_with_No_Internet_at_Home.csv"))

Internet_data[56,] <- list(56,"Unassigned -- Jail",0,0,0,0,0)

#Educational indicators:

HSAexam <- read.csv(file = here::here("data/Percentage_of_Students_Passing_H.S.A._Government.csv"))

HSAexam[56,] <- list(56,"Unassigned -- Jail",0,0,0,0,0,0)

Readiness <- read.csv(file = here::here("data/Kindergarten_School_Readiness.csv"))

Readiness[56,] <- list(56,"Unassigned -- Jail",0,0,0,0,0)

#Racial indicators

Caucasian_perc <- read.csv(file = here::here("data/Percent_of_Residents_-_White_Caucasian_(Non-Hispanic).csv"))

Caucasian_perc[56,] <- list(56,"Unassigned -- Jail",0,0,0,0,0,0,0,0,0)

AfroAm_perc <- read.csv(file = here::here("data/Percent_of_Residents_-_Black_African-American_(Non-Hispanic).csv"))

AfroAm_perc[56,] <- list(56,"Unassigned -- Jail",0,0,0,0,0,0,0,0,0)

Community_data <- Community_data %>% 
  left_join(HSAexam[,c(2,6)],by=c("Community"="CSA2010")) %>%
  left_join(Internet_data[,c(2,5)],by=c("Community"="CSA2010")) %>% 
  left_join(Readiness[,c(2,5)],by=c("Community"="CSA2010")) %>% 
  left_join(Unemployment[,c(2,12)],by=c("Community"="CSA2010"))%>% 
  left_join(Caucasian_perc[,c(2,9)],by=c("Community"="CSA2010"))%>% 
  left_join(AfroAm_perc[,c(2,9)],by=c("Community"="CSA2010"))
fullmod <- lm(Community_data$density_perc~Community_data$ViolentCrimePerCapitaPerArea + Community_data$hhpov19 +Community_data$hsagov14 + Community_data$nohhint19+Community_data$ready13+Community_data$unempr19+Community_data$pwhite20+Community_data$paa20)

nullmod <- lm(Community_data$density_perc~1)

selAIC <- step(nullmod,scope=list(lower=nullmod,upper=fullmod),direction="forward")
#We have a satisfactory R-Squared
First selected model after a forward stepwise regression, 4 independent variables are retained
Dependent variable:
CCTV Density
Violent Crime (per 1000 inhabitants 0.008***
(0.001)
HS exam passing -0.068***
(0.020)
% of Caucasian 0.065**
(0.025)
% of Afro-American 0.028*
(0.016)
intercept 0.131
(1.270)
Observations 56
R2 0.608
Adjusted R2 0.577
Residual Std. Error 1.530 (df = 51)
F Statistic 19.800*** (df = 4; 51)
Note: p<0.1; p<0.05; p<0.01
library(car)
vif(selAIC) #Yet, the issue is that we have multicollinearity

lm(Community_data$paa20~Community_data$pwhite20)
summary(lm(Community_data$paa20~Community_data$pwhite20)) #This illustrates the mutlicollinearity issue, percentage of white people is highly correlated with percentage of black people

lm(Community_data$density_perc~Community_data$pwhite20+Community_data$paa20)
summary(lm(Community_data$density_perc~Community_data$pwhite20+Community_data$paa20)) #We observe that the percentage of white people is negatively correlated with CCTV density while the percentage of black people is positively correlated. However, none of the results are significant.

lm(Community_data$density_perc~Community_data$pwhite20)
summary(lm(Community_data$density_perc~Community_data$pwhite20))

lm(Community_data$density_perc~Community_data$paa20)
summary(lm(Community_data$density_perc~Community_data$paa20)) #Overall, this enables us to conclude that the race does not seem to be such a good predictor
fullmod2 <- lm(Community_data$density_perc~Community_data$ViolentCrimePerCapitaPerArea + Community_data$hhpov19 +Community_data$hsagov14 + Community_data$nohhint19 + Community_data$ready13 + Community_data$unempr19)

nullmod <- lm(Community_data$density_perc~1)

selAIC2 <- step(nullmod,scope=list(lower=nullmod,upper=fullmod2),direction="forward")
summary(selAIC2) 
Second selected model after a forward stepwise regression, 4 independent variables are retained
Dependent variable:
CCTV Density
Violent Crime (per 1000 inhabitants) 0.009***
(0.002)
HS exam passing -0.022
(0.014)
Unemployment rate -0.126**
(0.055)
Poverty level 0.049
(0.030)
intercept 0.838
(1.150)
Observations 56
R2 0.586
Adjusted R2 0.554
Residual Std. Error 1.570 (df = 51)
F Statistic 18.100*** (df = 4; 51)
Note: p<0.1; p<0.05; p<0.01
#We see that internet access information and readiness are not even included and that among the included variables, only two are significant: violent crime and unemployment rate. This enables us to conclude that like racial metrics, educational metrics are poor predictors of CCTV levels.

library(car)
vif(selAIC2) #We can try to make a multiple regression containing the only two significant variables: 

CCTV_VS_ViolentCrime_and_Unemp <- lm(Community_data$density_perc~Community_data$ViolentCrimePerCapitaPerArea+Community_data$unempr19)
Multiple regression with only unemployement and violent crime
Dependent variable:
CCTV Density
Violent Crime (per 1000 inhabitants) 0.010***
(0.001)
Unemployment rate -0.061
(0.047)
intercept -0.818
(0.494)
Observations 56
R2 0.531
Adjusted R2 0.513
Residual Std. Error 1.640 (df = 53)
F Statistic 30.000*** (df = 2; 53)
Note: p<0.1; p<0.05; p<0.01



Overall, this leads us to think that violent crime per capita in a given area is the main element that determines CCTV density. Racial metrics, educational metrics and socio-economic indicators all turned out being quite poor indicators. The results of the multiple regressions therefore allow us to conclude that the government does not seem to specifically target certain types of people and most importantly, it does not seem to specifically target least privileged society members like poorer, unemployed or less educated people.Yet, this statement must be mitigated. Indeed, with a R-squared of approximately 50%, we are in a situation where we can only explain 50% of the variation. This means that there approximately is 50% of the variation whose origin is unknown to us.

Conclusion

Conclusion:

Our efforts to analyse the data obtained on Baltimore’s open data portal have enabled us to not only observe some interesting phenomena but also to relate them and draw conclusions. First, we were able to observe the distribution of surveillance cameras in Baltimore City and found that they were mostly concentrated in the city centre. We also found that crime per capita was highest in the inner city, suggesting a correlation between crime and CCTV. Analysing crime by type, we found some slight differences by communities: communities on the east side of the city seem to be slightly more affected by less severe crime and property crime, while communities on the west side of Baltimore seem to be more affected by more severe crime and violent crime. Moreover, despite low statistical significance, it seems to have been detected that poorer areas tend to be more affected by crime in general. After this rather spatial analysis of crime, a temporal analysis revealed that over the period 2014-2019, crime (in absolute numbers) had increased slightly, although there has been a downward trend since a peak in 2017. We also observed that crime was seasonal in Baltimore and that crime tended to decline in winter. Finally, we found that over the 2014-2019 period, felonies increased as did violent crime, while misdemeanours and property crime decreased. This suggests that more violent and severe crimes have increased while less severe and property crimes have decreased.
These exploratory analyses provided the basis for answering our several research questions. We first investigated whether CCTVs were effective in deterring crime by looking at the relationship between crime evolution and CCTV density. This analysis showed that there did not appear to be a significant relationship between these two variables. By comparing the evolution of crime in areas with a high CCTV density with areas with a low CCTV density, we were not able to demonstrate a trend. A more spatial analysis of the phenomenon allowed us to confirm this conclusion by showing that a lot of crime was committed, sometimes right in front of the surveillance cameras. We used the same methodology to investigate whether surveillance cameras could have a specific impact on certain types of crime. We were unable to find a significant correlation between the evolution of any of the four types of crime observed and CCTV density, suggesting that CCTV cameras were not particularly effective for a certain type of crime either. However, one of the findings of this analysis is that there is a particularly strong correlation between CCTV density and violent crime per capita. The failure to demonstrate the effectiveness of CCTVs on crime made it impossible to answer our third research question. However, we wanted to examine how crime evolved by areas according to its level of poverty in order to see if we could detect inequalities in terms of the ‘right to safety’. Our regression results indicate that there is no correlation between changes in crime and poverty level. Finally, we tried to analyse what could determine the CCTV density of a areas Indeed, as mentioned in the introduction, human rights activists are concerned that the most disadvantaged people do not enjoy the same privacy as the rest of the population. We found that there indeed is a correlation between poverty level and CCTV density. However, the relationship is relatively weak. Always in the aim of finding out what determine the CCTV density of a given area, we performed a stepwise regression containing plenty of different variables. We arrived to the conclusion that the most important determinant of CCTV density was not race, nor education of socio-economic condition but rather violent crime per capita. This result is somehow reassuring, although we have just highlighted that CCTVs were not particularly effective in deterring violent crime.

Take Home Message:

Some of the fears of leading human rights advocates seem to be justified, others less so. Indeed, given that the effectiveness of surveillance cameras is difficult to demonstrate and that the gain in security generated by them is consequently limited in relation to the loss of privacy they cause, the principle of proportionality does not seem to be respected. It is therefore legitimate to doubt the validity of massive data collection under the guise of protecting the population. However, with regard to their fears that the government is systematically and specifically targeting certain members of the population who are already disadvantaged, it is impossible for us to confirm this with the data available to us.
Based on the results of our analysis, we believe that it would be essential for the Baltimore Police Department to offer transparency to their citizens. Indeed, we believe that the latter would be legitimate to demand the following from their authorities :

  • a report demonstrating that surveillance cameras and the massive collection of its citizen’s data can still have a positive impact. Indeed, it is likely that although surveillance cameras are not effective in deterring crime, they could potentially have other positive effects such as facilitating the arrest of criminals. It would be necessary to verify such potential effects.

  • a detailed report explaining the criteria on which the government bases its decision to implement a surveillance camera

Limitations:

Obviously, our work has its limitations. The main one is the limited number of surveillance cameras in our original data set. As mentioned in section 2.2, one of the columns of our data set is called “CAM_NUM”, not having all the camera numbers contained in our data set, it seems to suggest that only a fraction of the surveillance cameras installed in Baltimore are mentioned in the data set. This would make sense that the Baltimore Police Department does nto discolse publicly the exact location of all its CCTVs. Furthermore, based on our common sense, it does seem that 836 cameras is a bit too few for a city the size of Baltimore. Some communities do not even have CCTVs according to the data set. This is problematic because it limits the significance of our regressions for example. It is also worth mentioning that the fact that we ran our regressions with only 56 values at a time makes it difficult to get very significant results and makes our conclusions more fragile. A higher number of observations would have been desirable.

Another limitation is the lack of information on when the surveillance cameras were installed. Indeed, this would have allowed us to more accurately assess the effectiveness of CCTVs but also to answer a question we would have liked to answer, namely: can CCTVs generate crime displacement issues?

Finally, it is also important to mention that while answering our fourth research question, using violent crime per capita as the sole independent variable, we were only able to explain 50% of the variance of the CCTV density of an area. While this result is encouraging, it still means that there is another half of the variance that we cannot explain with the data we have. It is therefore possible that the government is indeed systematically targeting a section of the population and that we simply have not found the criteria on which they are doing this. Indeed, the spread of surveillance cameras in Baltimore does not seem to be random as one can discern certain patterns (e.g. certain areas are particularly populated with cameras, certain cameras seem to be aligned on certain streets etc…).

Future work:

We think that it would be interesting to carry out the same analyses as we did with a data set comprising all of the city’s surveillance cameras in order to check whether similar conclusions are obtained. Also, access to more information would allow us, for example, to invalidate or not the hypothesis that surveillance cameras could displace crime.
Secondly, it would be of particular interest to do similar analyses with other cities and to analyse whether it is possible to create a model capable of ‘predicting’ CCTV density as a function of crime levels. Indeed, one could build a model on our Baltimore data set and treat it as test set, and then apply this model on another data set of a different city. <brY Finally, we believe that it would be very important to analyse the impact that alternative and less privacy-threatening measures to deter crime could have. For example, prevention campaigns aimed at nudging people in doing good through showing some poster with some motivational verses - i.e. “Do to others how you yourself want to be treated”, the establishment of youth centers aimed at preventing juvenile crime and gang enrollment, or the establishment of “neighbourhood watch” crime prevention programmes.